머신러닝 기반 인공지능 특허 품질 예측

김성현; 옥창훈; 김영민

논문 상세보기

머신러닝 기반 인공지능 특허 품질 예측 KCI 등재

Machine Learning Based Artificial Intelligence Patent Quality Prediction

김성현, 옥창훈, 김영민

언어KOR
URLhttps://db.koreascholar.com/Article/Detail/428482

구독 기관 인증 시 무료 이용이 가능합니다. 5,800원

기술혁신연구 (Journal of Technology Innovation)

31권 4호 (2023.11)
pp.61-82

기술경영경제학회 (The Korea Society for Innovation Management & Economics)

초록

인공지능은 4차 산업혁명의 프레임이 소개된 이후 점차 보편적인 기술로 자리를 잡아가고 있으며, 인공지능 관련 특허 출원도 크게 증가하고 있다. 최근에는 특허 생태계가 출원 건수 위주의 양적 경쟁에서 고품질의 특허 확보라는 질적 경쟁으로 패러다임이 변화되면서, 저품질 특허로 인한 비용 손실에 관심이 높아지고 있다. 이러한 배경으로 본 연구에서는 머신러닝과 Doc2Vec 알고리즘을 활용하여 특허 품질을 예측하는 방법을 제안하고자 한다. 본 연구를 위해 WIPO에서 정의한 CPC 코드를 활용하여 미국 특허청(USPTO)에 등록된 인공지능 관련 특허 데이터를 추출하였고, 이를 통해 정형 데이터 기반 19개 변수, 비정형 데이터 기반 7개 변수를 개발하였다. 특히, 새롭게 제안하는 Doc2Vec 알고리즘을 이용한 제목과 초록 텍스트 유사도 변수는 고품질 특허를 예측하는데 영향을 미칠 것으로 판단된다. 이에 유사도 변수의 효과를 확인하기 위해 유사도 변수를 포함한 앙상블 기반 머신러닝 모델과 포함하지 않은 모델을 개발하여 비교하였다. 실험 결과, 유사도 변수를 포함한 모델이 AUC 0.013, f1-score 0.025가 높게 나타나 더 우수한 성능을 보였다. 이는 유사도 변수가 고품질 특허 예측에 기여한다는 것을 시사한다. 또한, SHAP을 이용하여 블랙박스 형태의 머신러닝 변수 영향도를 설명하였다. 본 연구를 통해 핵심 기술 분야인 인공지능과 같은 영역에서 특허의 품질을 예측하고, 고품질 특허 개발을 장려함으로써 사회적 가치를 실현하는 데 기여할 수 있을 것으로 기대한다.

Artificial intelligence has gradually become a ubiquitous technology since the introduction of the framework of the Fourth Industrial Revolution, and the number of patent applications related to artificial intelligence has also significantly increased. Recently, the paradigm of the patent ecosystem has shifted from a quantitative competition based on the number of applications to a qualitative competition focused on securing high-quality patents, due to the growing concern about the costs incurred by low-quality patents. Against this background, this study proposes a method for predicting patent quality using machine learning and the Doc2Vec algorithm. For this research, we utilized CPC codes defined by WIPO to extract patent data related to artificial intelligence from the United States Patent and Trademark Office (USPTO). Through this process, we developed 19 variables based on structured data and 7 variables based on unstructured data. Particularly, we introduced a novel approach using the Doc2Vec algorithm to calculate similarity variables for the title and abstract texts, which are expected to influence the prediction of high-quality patents. To assess the impact of these similarity variables, we developed and compared an ensemble-based machine learning model that includes the similarity variables with a model that does not. The experimental results showed that the model incorporating the similarity variables exhibited superior performance with an AUC of 0.013 and an f1-score of 0.025, indicating their contribution to predicting high-quality patents. Additionally, we explained the variable importance of the black-box machine learning model using SHAP. Through this study, we expect to contribute to the realization of social value by predicting the quality of patents and promoting the development of high-quality patents in the field of key technologies such as artificial intelligence.

키워드

특허 품질 예측 머신러닝 인공지능 Doc2Vec 유사도 Patent quality prediction Machine learning AI Doc2Vec Similarity

Ⅰ. 연구배경 및 목적
Ⅱ. 관련 연구
    1. 인공지능 분야 특허 품질 연구
    2. 머신러닝을 통한 특허 품질 예측
    3. 특허 텍스트 분석
Ⅲ. 데이터 및 변수
    1. 데이터
    2. 변수
Ⅳ. 모 델
    1. Doc2Vec 기반 유사도
    2. 성능 평가
    3. 변수 중요도
Ⅴ. 결 론

저자

김성현(한양대학교 기술경영전문대학원 박사과정) | Sunghyun Kim
옥창훈(한양대학교 기술경영전문대학원 박사과정) | Changhun Ok
김영민(한양대학교 기술경영전문대학원 교수) | Youngmin Kim

같은 권호 다른 논문