Elon·Musk is a business man who attracts the world’s most attention, not only because of its unusual business mind, advanced challenging consciousness and legendary entrepreneurial experience which made him the world's richest man, but also because he is good at using the trend of social network society (SNS) platform to achieve social interaction. This study uses python 3.11 software to capture and filter Musk's Weibo articles on August 18th, 2023, and makes logical analysis based on the chronological related events, so as to extract Musk’s cognitive characteristics of Chinese social media. This paper finds that Chinese social media builds Musk's image cognition through reporting and judging his career development and hot issues, the cognition varies with the dynamic changes of character events; Chinese social media focuses on fields of Tesla intelligent driving, spaceship and brain neural technology, as well as social media; Weibo articles’ cognitive characteristics of Musk's image are extreme, where the extremely positive proportion accounts for more than 60%, and the extremely negative proportion accounts for more than 10%.
The metal bush assembling process is a process of inserting and compressing a metal bush that serves to reduce the occurrence of noise and stable compression in the rotating section. In the metal bush assembly process, the head diameter defect and placement defect of the metal bush occur due to metal bush omission, non-pressing, and poor press-fitting. Among these causes of defects, it is intended to prevent defects due to omission of the metal bush by using signals from sensors attached to the facility. In particular, a metal bush omission is predicted through various data mining techniques using left load cell value, right load cell value, current, and voltage as independent variables. In the case of metal bush omission defect, it is difficult to get defect data, resulting in data imbalance. Data imbalance refers to a case where there is a large difference in the number of data belonging to each class, which can be a problem when performing classification prediction. In order to solve the problem caused by data imbalance, oversampling and composite sampling techniques were applied in this study. In addition, simulated annealing was applied for optimization of parameters related to sampling and hyper-parameters of data mining techniques used for bush omission prediction. In this study, the metal bush omission was predicted using the actual data of M manufacturing company, and the classification performance was examined. All applied techniques showed excellent results, and in particular, the proposed methods, the method of mixing Random Forest and SA, and the method of mixing MLP and SA, showed better results.
군사학은 급변하는 안보환경과 국제정세의 변화, 4차산업혁명시대의 무기체계 발전과 저출산에 따른 병역제도 등의 사회적 관심이 증대되 고 있다. 따라서 본 연구는 빅데이터를 활용한 텍스트마이닝 기법으로 군사학의 학술연구 동향과 사회적 인식을 분석하여 시사점을 제시하는 데 있다. 연구 결과 학술연구 동향은 주변국 관계, 무기체계, 방위산업, 인공지능 등이 중점을 이루었지만, 사회적 인식은 대학교와 군사학과, 장교 등의 관심으로 차이점을 보였다. 군사학 발전을 위해 연구 중심의 역량과 환경을 구축하고, 융·복합적 연구와 지역사회와 연계한 산학협 력 체계구축 및 국민 참여를 통한 학술 세미나 및 통합연구 등이 요구 되었다.
This study was conducted to explore the change in the market issues on HMR (Home Meal Replacements) using local foods after the COVID-19 outbreak. Online text data were collected from internet news, social media posts, and web documents before (from January 2016 to December 2019) and after (from January 2020 to November 2022) the COVID- 19 outbreak. TF-IDF analysis showed that ‘Trend’, ‘Market’, ‘Consumption’, and ‘Food service industry’ were the major keywords before the COVID-19 outbreak, whereas ‘Wanju-gun’, ‘Distribution’, ‘Development’, and ‘Meal-kit’ were main keywords after the COVID-19 outbreak. The results of topic modeling analysis and categorization showed that after the COVID-19 outbreak, the ‘Market’ category included ‘Non-face-to-face market’ instead of ‘Event,’ and ‘Delivery’ instead of ‘Distribution’. In the ‘Product’ category, ‘Marketing’ was included instead of ‘Trend’. Additionally, in the ‘Support’ category, ‘Start-up’ and ‘School food service’ appeared as new topics after the COVID-19 outbreak. In conclusion, this study showed that meaningful change had occurred in market issues on HMR using local foods after the COVID-19 outbreak. Therefore, governments should take advantage of such market opportunity by implementing policy and programs to promote the development and marketing of HMR using local foods.
최근 GPS에 기반한 위치 수집 기술의 발전과 스마트폰과 같은 GPS를 탑재한 디바이스의 폭발적인 증가로 사람, 차량, 선박, 항공체와 같은 움직이는 물체의 지리적 위치에 대한 엄청난 양의 데이터가 실시간으로 수집되고 있다. 이는 사물의 움직임과 관련된 중요한 학문적 및 실용적 가치를 가지고 있다. 이와 같은 데이터를 분석하기 위한 데이터 마이닝 방법 또한 함께 발전하고 있으며 연구자들은 궤적 데이터를 활용하여 도시에서 일어나는 이동 현상과 도시를 구성하는 장소 간의 관계 등을 탐색함으로써 다양한 도시 문제에 대한 해결방안을 제시하고 있다. 궤적은 다양한 물체의 움직임을 추적할 수 있는 만큼 그 활용 분야와 목적 역시 매우 다양하여 도시 계획, 교통, 행동생태학, 공공안전, 이상 및 위반 탐지, 감시 등과 같은 분야에서 널리 활용되고 있다. 특히 최근 데이터 마이닝 방법론과 딥러닝 기술의 발전으로 궤적 데이터 분석에 다양한 분석방법이 융합적으로 접목되어 의미 있는 연구결과 도출되고 있어 이에 대한 체계적 분석이 필요하다. 이러한 배경하에 본 연구는 궤적 데이터를 활용한 국내외 약 150여 편의 연구를 응용분야 및 활용방법론 별로 구분하고, 응용분야별, 궤적 데이터 분석 방법론별 최근 동향을 분석하였다. 이는 향후 궤적 데이터에 적용가능한 방법론 탐색, 궤적 데이터 분석과 관련된 구체적 사례 탐색, 궤적 데이터를 활용한 응용서비스 도출의 자료로 활용될 수 있을 것으로 사료된다.
The purpose of this paper is to understand the key factors for efficient maintenance of rapidly aging facilities. Therefore, the safety inspection/diagnosis reports accumulated in the unstructured data were collected and preprocessed. Then, the analysis was performed using a text mining analysis method. The derived vulnerabilities of tunnel facilities can be used as elements of inspections that take into account the characteristics of individual facilities during regular inspections and daily inspections in the short term. In addition, if detailed specification information and other inspection results(safety, durability, and ease of use) are used for analysis, it provides a stepping stone for supporting preemptive maintenance decision-making in the long term.
국내에서 연구된 금융교육 유관 학술논문을 보다 객관적으로 이해하고자 논문 초록에서 추출된 키워드를 중심으로 주요 토픽을 추론하여 포괄적인 담론들을 알아보고자 한다. 연구의 효율성을 높이고 반복될 수 있는 후속과제 연구를 위하여 빅 데이터 분석기법(텍스트 마이닝 - LDA)을 활용하였고 주요토픽에 대한 단어들을 추출하였다. 총 208건의 유관된 학술 논문을 전 처리 한 후에 추출된 명사 32,523건 중 상위빈도 1,201건에 대하여 LDA 토픽모델링을 실시한 결과 16개의 토픽 군이 형성되었다. 최다 빈도의 단어는 “금융이해력” 이었고 다음은 “학생”, “금융소비자” 순이었는데 추론 된 토픽들의 공통적인 주요 요소에는 학교와 학생들에게 교육을 제공하거나 공급하는 과정에 관심을 가지고 있다는 것이었다. 핵심 텍스트와 토픽을 정의하면서 피교육자의 사회적 요구와 성인을 위한 금융교육 관심도가 미흡하여 향후 지속적인 연구영역 확대 가 필요하다는 시사점을 발견하게 되었다.
본 연구는 데이터 마이닝 기반 의사결정 나무 분석을 적용해 Z세대 스포츠 소비 스타일을 탐색 하여 Z세대가 주도할 스포츠 소비 시장을 예측하기 위한 기초자료를 제공하고자 했다. 따라서 Z세대 중 만 19세 이상 남성 및 여성을 표본으로 선정해 본 조사를 실시했으며, 총 429명의 자료를 최종 분석에 사용했다. 자료처리는 SPSS statistics(ver. 21.0) 프로그램을 이용하여 빈도분석, 탐색적 요인분석, 재검사 신 뢰도 및 신뢰도 분석, 의사결정 나무 분석을 실시했다. 본 연구의 주요 결과는 다음과 같다. 첫째, 합리 효율성 지수가 높고, 심미적 소비 지수가 낮을 경우 여성 집단으로 분류될 확률이 96.8%로 나타났다. 반면에 합리 효율성과 가격 지향 지수가 낮을 경우 남성 집단으로 분류될 확률이 100%로 나타났다. 둘째, 브랜드 지향, 가격 지향, 합리 효율성 지수가 높을 경우 수도권 집단으로 분류될 확률이 97.3%로 나타났다. 앞서 제시한 결과와는 상반적으로 브랜드 지향, 기념 의례, 지위 상징 지수가 낮을 경우 이외 지역 집단으로 분 류될 확률이 82.1%로 나타났다. 셋째, 지위 상징, 유행 지향 지수가 높으며, 기능성 지수가 낮을 경우 일상 생활 및 패션 집단으로 분류될 확률이 77.6%로 나타났다. 이와 반대로 지위 상징 지수가 낮고, 소속감 유지, 소비 향유 지수가 높을 경우 운동 및 경기 집단으로 분류될 확률이 81.0%로 나타났다.
After decades of vigorous development, data mining technology has achieved fruitful theoretical and application results. As a highly applicable subject, data mining technology has penetrated into various fields of the national economy, and has aroused great attention from academia and industry. A large amount of chart data is stored in the electronic chart database, and its application is very extensive, providing a valuable decision basis for managers in all walks of life. It is of great significance to establish a complete data management mechanism based on data mining technology. The traditional data analogy extraction technology, because of the data association index and the poor ability of data association, leads to the difference between the extraction data and the target data. Therefore, the application of data mining technology on electronic chart data management is studied. Data mining technology uses rough set to obtain the basic information of electronic chart data management according to similarity function, mining electronic chart data management association rules; through the comprehensive evaluanon data system of electronic chart data management, building rule base, setting up the evaluation index of electronic chart data management, achieving the similarity evaluation of the mining results. Experimental test results: compared with the traditional data analogy extraction technology, the results obtained by data mining technology have higher similarity with the target data and meet the requirements of electronic chart data management acquisition. It can be seen that this technology is more suitable for the application of electronic chart data management
People write reviews of numerous products or services on the Internet, in their blogs or community bulletin boards. These unstructured data contain important emotions and opinions about the author's product or service, which can provide important information for future product design or marketing. However, this text-based information cannot be evaluated quantitatively, and thus they are difficult to apply to mathematical models or optimization problems for product design and improvement. Therefore, this study proposes a method to quantitatively extract user’s opinion or preference about a specific product or service by utilizing a lot of text-based information existing on the Internet or online. The extracted unstructured text information is decomposed into basic unit words, and positive rate is evaluated by using existing emotional dictionaries and additional lists proposed in this study. This can be a way to effectively utilize unstructured text data, which is being generated and stored in vast quantities, in product or service design. Finally, to verify the effectiveness of the proposed method, a case study was conducted using movie review data retrieved from a portal website. By comparing the positive rates calculated by the proposed framework with user ratings for movies, a guideline on text mining based evaluation of unstructured data is provided.
본 연구는 소셜 네트워크 서비스 중 한 유형인 플리커를 이용하여 궤적 데이터를 생성하고, 서울을 방문한 관광객의 이동 특성을 분석하였다. 연구에는 2015년 1월 1일 부터 2017년 12월 31일까지 서울을 방문한 1,476명 관광객이 게시한 플리커 사진 39,157건을 활용하였다. 연구기간 내 서울을 방문한 관광객은 1회 방문시 평균 5.12일을 체류하며, 약 1.27회 방문한 것으로 나타났다. 서울방문 관광객의 첫 방문지는 종로・남산, 신촌・홍대, 이태원 순으로 나타났으며, 주 목적지는 종로・남산이며 주로 인접 지역으로 이동하는 것으로 나타났다. 본 연구에서 활용한 데이터와 방법론은 관광행태 분석을 효율화하고, 다각적 분석을 가능하게 하는데 기여할 것으로 판단된다.
본 논문에서는 차량용 반도체가 제품 출하 후 사용 환경에 따라 발생되는 불량률을 데이터 마이닝 기법을 이용하여 분석하였다. 20세기 이후 가장 보편적인 이동 수단인 자동차는 전자 컨트롤 장치와 자동차용 반도체의 사용량이 급격히 증가하면서 매우 빠른 속도로 진화하고 있다.
자동차용 반도체는 차량용 전자 컨트롤 장치 중 핵심 부품으로 소비자들에게 안정성, 연료 사용의 효율성, 운전의 안정감을 제공하기 위해 사용되고 있다. 자동차용 반도체는 가솔린엔진, 디젤 엔진, 전기 모터를 컨트롤하는 기술, 헤드업 디스플레이, 차선 유지 시스템 등 많은 부분에 적용되고 있다. 이와 같이 반도체는 자동차를 구성하는 거의 모든 전자 컨트롤 장치에 적용되고 있으며 기계적인 장치를 단순히 조합한 이상의 효과를 만들어 내고 있다.
자동차용 반도체는 10년 이상의 자동차 사용 기간을 고려하여 높은 신뢰성, 내구성, 장기공급 등의 특성을 요구하고 있다. 자동차용 반도체의 신뢰성은 자동차의 안전성과 직접적으로 연결되기 때문이다. 반도체업계에서는 JEDEC과 AEC 등의 산업 표준 규격을 이용하여 자동차용 반도체의 신뢰성을 평가하고 있다. 또한 자동차 산업에서 표준으로 제시한 신뢰성 실험 방법과 그 결과를 이용하여 개발 초기 단계 및 제품 양산 초기 단계에서 제품의 수명을 예측 하고 있다. 하지만 고객의 다양한 사용 조건 및 사용 시간 등 여러 변수들에 의해 발생되는 불량률을 예측하는 데는 한계가 있다. 이러한 한계점을 극복하기 위하여 학계와 산업계에서 많은 연구가 있어왔다. 그 중 데이터 마이닝 기법을 이용한 연구가 다수의 반도체 분야에서 진행되고 있지만, 아직 자동 차용 반도체에 대한 적용 및 연구는 미비한 상태이다.
이러한 관점에서 본 연구는 데이터 마이닝 기법을 이용하여 반도체 조립(Assembly) 과 패키지 테스트(Package test) 공정 중 발생 된 데이터들간의 연관성을 규명하고, 고객 불량 데이터를 이용하여 잠재 불량률 예측에 적합한 데이터 마이닝 기법을 검증하였다.
Current evaluation practices for IT projects suffer from several problems, which include the difficulty of self-explanation for the evaluation results and the improperly scaled scoring system. This study aims to develop a methodology of opinion mining to extract key factors for the causal relationship analysis and to assess the feasibility of quantifying evaluation scores from text comments using opinion mining based on big data analysis. The research has been performed on the domain of publicly procured IT proposal evaluations, which are managed by the National Procurement Service. Around 10,000 sets of comments and evaluation scores have been gathered, most of which are in the form of digital data but some in paper documents. Thus, more refined form of text has been prepared using various tools. From them, keywords for factors and polarity indicators have been extracted, and experts on this domain have selected some of them as the key factors and indicators. Also, those keywords have been grouped into into dimensions. Causal relationship between keyword or dimension factors and evaluation scores were analyzed based on the two research models-a keyword-based model and a dimension-based model, using the correlation analysis and the regression analysis. The results show that keyword factors such as planning, strategy, technology and PM mostly affects the evaluation result and that the keywords are more appropriate forms of factors for causal relationship analysis than the dimensions. Also, it can be asserted from the analysis that evaluation scores can be composed or calculated from the unstructured text comments using opinion mining, when a comprehensive dictionary of polarity for Korean language can be provided. This study may contribute to the area of big data-based evaluation methodology and opinion mining for IT proposal evaluation, leading to a more reliable and effective IT proposal evaluation method.
Recent development in science and technology has modernized the weapon system of ROKN (Republic Of Korea Navy). Although the cost of purchasing, operating and maintaining the cutting-edge weapon systems has been increased significantly, the national defense expenditure is under a tight budget constraint. In order to maintain the availability of ships with low cost, we need accurate demand forecasts for spare parts. We attempted to find consumption pattern using data mining techniques. First we gathered a large amount of component consumption data through the DELIIS (Defense Logistics Intergrated Information System). Through data collection, we obtained 42 variables such as annual consumption quantity , ASL selection quantity, order-relase ratio. The objective variable is the quantity of spare parts purchased in f-year and MSE (Mean squared error) is used as the predictive power measure. To construct an optimal demand forecasting model, regression tree model, randomforest model, neural network model, and linear regression model were used as data mining techniques. The open software R was used for model construction. The results show that randomforest model is the best value of MSE. The important variables utilized in all models are consumption quantity, ASL selection quantity and order-release rate. The data related to the demand forecast of spare parts in the DELIIS was collected and the demand for the spare parts was estimated by using the data mining technique. Our approach shows improved performance in demand forecasting with higher accuracy then previous work. Also data mining can be used to identify variables that are related to demand forecasting.
Our research is aimed at predicting recent trend and leading technology for the future and providing optimal Nano technology trend information by analyzing Nano technology trend. Under recent global market situation, Users’ needs and the technology to meet these needs are changing in real time. At this point, Nano technology also needs measures to reduce cost and enhance efficiency in order not to fall behind the times. Therefore, research like trend analysis which uses search data to satisfy both aspects is required. This research consists of four steps. We collect data and select keywords in step 1, detect trends based on frequency and create visualization in step 2, and perform analysis using data mining in step 3. This research can be used to look for changes of trend from three perspectives. This research conducted analysis on changes of trend in terms of major classification, Nano technology of 30’s, and key words which consist of relevant Nano technology. Second, it is possible to provide real-time information. Trend analysis using search data can provide information depending on the continuously changing market situation due to the real-time information which search data includes. Third, through comparative analysis it is possible to establish a useful corporate policy and strategy by apprehending the trend of the United States which has relatively advanced Nano technology. Therefore, trend analysis using search data like this research can suggest proper direction of policy which respond to market change in a real time, can be used as reference material, and can help reduce cost.
The mortality rate in industrial accidents in South Korea was 11 per 100,000 workers in 2015. It’s five times higher than the OECD average. Economic losses due to industrial accidents continue to grow, reaching 19 trillion won much more than natural disaster losses equivalent to 1.1 trillion won. It requires fundamental changes according to industrial safety management. In this study, We classified the risk of accidents in industrial complex of Ulju-gun using spatial analytics and data mining. We collected 119 data on accident data, factory characteristics data, company information such as sales amount, capital stock, building information, weather information, official land price, etc. Through the pre-processing and data convergence process, the analysis dataset was constructed. Then we conducted geographically weighted regression with spatial factors affecting fire incidents and calculated the risk of fire accidents with analytical model for combining Boosting and CART (Classification and Regression Tree). We drew the main factors that affect the fire accident. The drawn main factors are deterioration of buildings, capital stock, employee number, officially assessed land price and height of building. Finally the predicted accident rates were divided into four class (risk category-alert, hazard, caution, and attention) with Jenks Natural Breaks Classification. It is divided by seeking to minimize each class’s average deviation from the class mean, while maximizing each class’s deviation from the means of the other groups. As the analysis results were also visualized on maps, the danger zone can be intuitively checked. It is judged to be available in different policy decisions for different types, such as those used by different types of risk ratings.