Underwater Sonar Image Classification based on Vision Transformer with Metric Learning

Kyeongtaek Kim; Eunchul Park; Woohyong Lee; Jihoon Kyung

논문 상세보기

Underwater Sonar Image Classification based on Vision Transformer with Metric Learning KCI 등재

비전 트랜스포머 및 메트릭 러닝 기반 수중 소나 이미지 분류

Woohyong Lee, Kyeongtaek Kim, Eunchul Park, Jihoon Kyung

언어KOR
URLhttps://db.koreascholar.com/Article/Detail/449284

구독 기관 인증 시 무료 이용이 가능합니다. 4,000원

한국산업경영시스템학회지 (Journal of Society of Korea Industrial and Systems Engineering)

Vol.48 No.4 (2025.12)
pp.155-164

한국산업경영시스템학회 (Society of Korea Industrial and Systems Engineering)

초록

Underwater sonar image classification is essential for maritime surveillance, autonomous navigation, and underwater target identification, where optical sensing is often restricted by turbidity and light attenuation. To enhance the robustness of sonar-based perception under such challenging conditions, this study proposes a metric-enhanced Vision Transformer (ViT) framework that integrates Siamese-based representation alignment with distance-regularized classification. In the first stage, a Siamese pre-training strategy is employed to align embeddings of positive pairs, encouraging directionally consistent representations that improve class separability even under severe noise and viewpoint variations. In the second stage, the pretrained ViT encoder is frozen, and five classifiers—Linear, Cosine, Proxy, and their Mahalanobis-regularized variants—are systematically evaluated to investigate the effect of embedding normalization and distributional alignment. Experimental results on the UATD dataset demonstrate that the Siamese-trained ViT produces more stable and discriminative features than both ResNet-50 and standard ViT-S. Among the classifiers, the Mahalanobis-regularized cosine classifier achieves the highest, showing significant reductions in misclassification between visually similar classes such as cube and square cage. Overall, the proposed approach highlights the effectiveness of combining ViT with metric learning and covariance-aware distance normalization for underwater sonar image recognition. The results suggest that metric-enhanced transformers offer a robust and generalizable foundation for sonar-based perception in real maritime environments.

키워드

Underwater Sonar Image Classification Metric Learning Vision Transformer Siamese Network

1. 서 론
2. 연구배경 및 관련 연구
    2.1 수중 소나 이미지 분석
    2.2 기존 심층 신경망 기반 접근법
    2.3 전이 학습
    2.4 메트릭 러닝
    2.5 거리 기반 분류기
    2.6 관련연구
3. 제안 방법
    3.1 개요
    3.2 1단계: 시암 ViT 표현 학습
    3.3 2단계: 분류기 학습
4. 실험 및 결과분석
    4.1 데이터 셋
    4.2 실험 구성
    4.3 실험 결과 및 분석
5. 결 론
Acknowledgement
References

저자

Kyeongtaek Kim(Link Nine System Inc.) | 김경택 (㈜ 링크나인시스템)
Eunchul Park(Maritime Research Center, LIG Nex1) | 박은철 (LIG 넥스원 해양연구소)
Woohyong Lee(Department of Industrial Engineering, Hannam University) | 이우형 (한남대학교 산업공학과)
Jihoon Kyung(Department of Industrial Engineering, Hannam University) | 경지훈 (한남대학교 산업공학과) Corresponding author

같은 권호 다른 논문