Data on patent and scientific paper is considered as a useful information source for analyzing technological information and has been widely utilized. Technology big data is analyzed in various ways to identify the latest technological trends and predict future promising technologies. Clustering is one of the ways to discover new features by creating groups from technology big data. Patent includes refined bibliographic information such as patent classification code whereas scientific paper does not have appropriate bibliographic information for clustering. This research proposes a new approach for clustering data of scientific paper by utilizing reference titles in each scientific paper. In this approach, the reference titles are considered as textual information because each reference consists of the title of the paper that represents the core content of the paper. We collected the scientific paper data, extracted the title of the reference, and conducted clustering by measuring the text-based similarity. The results from the proposed approach are compared with the results using existing methodologies that one is the approach utilizing textual information from titles and abstracts and the other one is a citation-based approach. The suggested approach in this paper shows statistically significant difference compared to the existing approaches and it shows better clustering performance. The proposed approach will be considered as a useful method for clustering scientific papers.
최근 학술정보의 이용 방식이 인쇄매체 중심에서 디지털자료 중심으로 변화하고 있다. 디지털 기술을 통해 온라인상에서 학술정보를 신속하고 자유롭게 공유할 수 있다면, 학문의 진보는 더욱 가속화될 수 있을 것이다. 그러나 이를 위해서는 학술정보의 생산, 자금지원, 보급, 이용과 관련된 다양한 이해당사자들이 추구하는 목표가 상충하면서 만들어내는 법적, 제도적인 장애물의 해소가 선행되어야 한다. 이 글에서는 한국의 디지털 학술정보 유통 현황 및 관련된 법적 분쟁 사례를 알아보고, 이를 해결하기 위한 방안으로서 최근 각광받고 있는‘오픈 액세스(Open Access)’와 현재 한국에서의 오픈 액세스 추진 사례에 관하여 살펴보았다. 오픈 액세스는 저자들이 이용자들에게 재정적, 법률적, 기술적장벽 없이 인터넷을 통해 학술논문의 원문을 누구나 무료로 접근하여 읽고, 다운로드하고, 복제∙배포∙인쇄∙탐색∙링크할 수 있도록 허용하는 것을 의미한다. 비영리적 성격을 지닌 학술저작물의 경우 인간의 사상 또는 감정을 표현한 창작물이므로 저작권법의 보호를 받지만, 학문의 진보에 기여하기 위해서는 학술저작물의 공정하고 자유로운 이용이 이루어져야 한다. 한국에서도 공공영역을 중심으로 비영리 학술정보의 오픈 액세스를 확대한다면, 저작자와 이용자 모두의 이해관계를 충족시키면서 학문의 발전에도 이바지할 수 있을 것이다.
Nowadays researchers attach a great importance to the problems concerned with scientific information in the field of science and engineering. There are some reasons for it, that is, ⅰ) the amount of scientific information increases in proportion to the activities of scientists and engineers, so it is difficult to pick up a real valuable information ⅱ) it becomes more important to use a variety of information in proportion to the spread ofthe branch of science ⅲ) since the medium of scientific information is mostly technical papers, it is very difficult to mechanically transact these papers and to keep all documents and scientific informations for a long time. To cope with these difficult situations, many technical skills have been developed, for example, data-base, automatic information retrieval, micro-film and so on. But there are comparatively few investigation on the matter how the researchers who are users and producers think about the systematization of scientific information usage, so this paper investigates the thought and information needs of researchers, and proposes a basis of a method for systematization of scientific information usage. The author inspects the actual conditions of scientific information, reconsider the method which has been used and investigates the matter of how researchers whose interest is closely related to the study of marine affairs think about problems of scientific information usage by thequestionarie of Fuzzy-DEMATEL method. Also, FSM which is method for structuring hierarchy for the several complex problems on the basis of fuzzy sets theory is adopted as a tool of analysis. We can understand the key problems and make a story to solve the systematization of scientific information usage from the results of the analysis and those results will be directly applicable to construct a new system for scientific information usage.