A comparative analysis of hate speech corpora by target groups
This study compares four major Korean hate speech corpora to analyze differences in target group categorization and the impact of construction methods. The results show that the proportions of key target groups— gender, origin, physical traits, and ideology—vary greatly depending on how the corpus was compiled. In particular, keyword-based data sets revealed a notably higher focus on gender, with the influence of data collectors’ subjectivity and contemporary social issues clearly reflected. Furthermore, by analyzing the intersectionality and exclusivity of hate speech, we identified complex interconnections among target groups and challenges in classification arising from contextual usage. The study recommends the development of theme-specific and context-aware corpora to accurately capture these features.