논문 상세보기

빅 데이터의 효율적 군집화를 위한 알고리즘에 대한 비교 분석

Comparing and Analyzing on Algorithms for the Effective Clustering of Big Data

  • 언어KOR
  • URLhttps://db.koreascholar.com/Article/Detail/321189
구독 기관 인증 시 무료 이용이 가능합니다. 4,000원
대한안전경영과학회 (Korea Safety Management & Science)
초록

As Internet has been wildly spreaded and it's technique is advanced, the use of computers has been routinized and almost data are stored in computers. Accordingly, many companies and researchers have tried to find the relations in these tremendous data and the one way is to use clustering algorithm which is used to find out similar data set in the entire data set and to discover the common properties. In early period, clustering algorithm was performed based on a main memory of a computer and PAM(Partitioning Around Medoids) was representative, which can be complemented k-means algorithm defeat. PAM performs clustering by using the medoid of data instead of means. PAM works well in small data set but it is difficult to apply it to large data set. Therefore, CLARA(Clutering LARge Application) shows up to be used in large data set. This algorithm samples data from large data set and applies PAM to the sample data. CLARA has limits caused by the fixed samples in each clustering stage and has a problem that if the good mediod is not sampled then the result of the clustering becomes not good. CLARANS(Clustering Large Application based upon Randomized Search) overcomes these problems by drawing a sample with some randomness. This algorithm executes clustering using k mediod set extracted in the processing of clustering in each stage. The main objective is to compare and analyze the algorithms which are popularly used for the clustering of big data.

저자
  • 이순근(강릉원주대학교 산업정보경영공학과)
  • 임영문(강릉원주대학교 산업정보경영공학과)