비윤리적 지시화행에 대한 인공지능의 응대 양상 : 우회적 지시 전략을 중심으로

김재희; 김한샘

논문 상세보기

비윤리적 지시화행에 대한 인공지능의 응대 양상 : 우회적 지시 전략을 중심으로 KCI 등재

AI responses to unethical directive speech acts: The case of indirect and evasive strategies

김재희, 김한샘

언어KOR
URLhttps://db.koreascholar.com/Article/Detail/447158

구독 기관 인증 시 무료 이용이 가능합니다. 8,700원

사회언어학 (The Sociolinguistic Journal of Korea)

제33권 제4호 (2025.12)
pp.43-80

한국사회언어학회 (The Sociolinguistic Society of Korea)

초록

This study examines how Large Language Models (LLMs) recognize and refuse unethical directive speech acts by analyzing their responses to indirect and evasive user requests. Based on the Cross-Cultural Speech Act Realization Project (CCSARP), directive prompts were constructed by varying degrees of indirectness to evaluate the models’ pragmatic inference abilities. The study was conducted in two stages. First, a high rate of information leakage was observed for indirect directives using ChatGPT-4o (February 2025 version). Second, newer models—GPT-5, Claude Sonnet 3.7 and 4, and Gemini 2.5 Flash—were tested across four categories of unethical directives through multiturn dialogues. Logistic regression with Benjamini–Hochberg FDR correction revealed that although newer models displayed improved refusal performance overall, they remained vulnerable to highly indirect and non-conventional directives, particularly those related to discrimination and harmful behaviors. These results suggest that current AI safety systems rely heavily on surface-level keyword filtering, indicating the need for models to better learn diverse directive strategies and expressions in Korean. Moving beyond technology-centered safety evaluation, this study experimentally analyzes AI pragmatic response mechanisms and proposes directions for fostering ethical communication in future human–AI interactions.

키워드

AI language ability evaluationunethical directive speech actsCCSARPsafety evaluationethical response

Abstract
1. 들어가며
2. 이론적 배경
    2.1. 인공지능 안전성 연구
    2.2. 인간의 명령과 지시화행
3. 연구 방법
    3.1. 실험 설계
    3.2. 실험 대상 인공지능 모델
4. 연구 결과
    4.1. 1차 실험
    4.2. 2차 실험
5. 나가며
참고문헌

저자

김재희(Ph.D. Student, Department of Korean Language and Literature, Yonsei University, 50 Yonsei-ro, Seoul 03722, South Korea) | Jaehee Kim
김한샘(Professor, Interdisciplinary Graduate Program of Linguistics and Informatics, Yonsei University, 50 Yonsei-ro, Seoul 03722, South Korea) | Hansaem Kim Corresponding author

같은 권호 다른 논문