작물 이미지 객체 분할 학습데이터 구축을 위한 Segmentation Anything Model 3 기반 텍스트 유도형 어노테이션 시스템 개발

석승원; 박혜성; 김솔희; 이영미; 김태곤

논문 상세보기

작물 이미지 객체 분할 학습데이터 구축을 위한 Segmentation Anything Model 3 기반 텍스트 유도형 어노테이션 시스템 개발 KCI 등재

Development of a Text-Guided Annotation System for Object Segmentation Training Data in Crop Images Based on the Segmentation Anything Model 3

석승원, 박혜성, 김솔희, 이영미, 김태곤

언어KOR
URLhttps://db.koreascholar.com/Article/Detail/449319

구독 기관 인증 시 무료 이용이 가능합니다. 4,600원

생물환경조절학회지 (Journal of Bio-Environment Control)

Vol.35 No.1 (2026.01)
pp.26-39

한국생물환경조절학회 (The Korean Society For Bio-Environment Control)

초록

농업 분야 컴퓨터 비전(Computer Vision) 기술 확산으로 고품질 학습 데이터 확보가 필수적이나, 기존의 수동 데이터 구축 방식은 많은 시간과 비용이 소요되는 한계가 있다. 이에 본 연구는 최신 멀티모달 파운데이션 모델인 SAM3(Segment Anything Model 3)를 기반으로 반자동 어노테이션 시스템을 개발하였다. 제안 시스템은 (1) 텍스트 프롬프트 기반 객체 인 식, (2) SAM3 기반 정밀 마스크 생성 및 학습 가능한 폴리곤 좌표 변환, (3) 사용자 검증의 3단계로 구성되며 GUI로 구현 되었다. 600장 이미지 평가 결과, SAM3는 92.9%의 매칭률 과 0.790의 평균 정밀도(mAP)를 달성하였으며, 데이터셋 구 축 시간을 수동 작업 대비 96~98% 단축시켰다. 이는 SAM+ CLIP, Grounding DINO+SAM 등 기존 파운데이션 모델 대 비 정확도와 효율성 모든 면에서 월등한 성능이다. 본 연구는 파운데이션 모델의 제로샷 성능을 활용해 농업 데이터 레이블 링 효율을 개선하고 관련 AI 연구 가속화에 기여할 것으로 기 대된다.

With the widespread adoption of computer vision technology in agriculture, securing high-quality training data has become essential. However, existing data construction methods are limited by the substantial time and cost required. In this work, we develop a semi-automatic annotation system that integrates SAM3 (Segment Anything Model 3), a state-of-the-art multimodal foundation model. SAM3(Segment Anything Model 3). The proposed system is implemented in a GUI environment and consists of three stages: (1) object recognition based on text prompts, (2) generation of precise masks and their conversion into trainable polygon coordinates, and (3) quality assurance through user verification. In performance evaluations using 600 images, SAM3 achieved a matching rate of 92.9% and a mean Average Precision (mAP) of 0.790, reducing dataset construction time by 96－98% compared to manual annotation. These results demonstrate superior performance in both accuracy and efficiency compared to baseline foundation models such as SAM+CLIP and Grounding DINO+SAM. This study highlights how the zero-shot capabilities of foundation models can drastically improve agricultural data labeling efficiency and accelerate related AI research.

키워드

농업 데이터셋 딥러닝 이미지 어노테이션 컴퓨터 비전 파운데이션 모델 agricultural dataset computer vision deep learning foundation model image annotation

Abstract
서 론
재료 및 방법
    1. 데이터셋 구축
    2. CLIP(Contrastive Language-Image Pre-training)
    3. Grounding DINO
    4. SAM(Segmentation Anything Model)과 SAM3
    5. SAM3 기반 어노테이션 시스템 설계 및 구현
    6. 세그멘테이션 모델 및 평가 파이프라인
    7. 평가 지표
결과 및 고찰
    1. 텍스트 기반 세그멘테이션 모델 비교 평가
    2. SAM3 기반 반자동 어노테이션 시스템 개발 및 적용
고 찰
적 요
사 사
Literature Cited

저자

석승원(전북대학교 농업생명과학대학 농공학과 석사과정) | Seungwon Seok (Master’s Student, Department of Agricultural Engineering, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea)
박혜성(국립원예특작과학원 인삼특작부 버섯과 연구사) | Hye-Sung Park (Researcher, Mushroom Research Division, National Institute of Horticultural and Herbal Science, Eumseong 27790, Korea)
김솔희(전북대학교 농업생명과학대학 스마트팜학과 연구부교수) | Solhee Kim (Associate Research Professor, Department of Smart Farm, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea)
이영미(전북대학교 자연과학대학 통계학과 조교수, 전북대학교 응용통계연구소 연구원) | Youngmi Lee (Assistant Professor, Department of Statistics, College of Natural Science, Jeonbuk National University, Jeonju 54896, Korea, Research Institute of Applied Statistics, Jeonbuk National University, Jeonju 54896, Korea)
김태곤(전북대학교 농업생명과학대학 스마트팜학과 부교수, 전북대학교 농업과학기술연구소 연구원) | Taegon Kim (Associate Professor, Department of Smart Farm, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54896, Korea, Researcher, Institute of Agricultural Science & Technology, Jeonbuk National University, Jeonju 54896, Korea) Corresponding author

같은 권호 다른 논문