Implementation of a Music Generation System Through Automatic Text Prompt Generation: Focusing on Emotion Analysis Through Music Analysis and Lyrics Analysis

Min-Jin Kim; Jin-Wan Park

논문 상세보기

Implementation of a Music Generation System Through Automatic Text Prompt Generation: Focusing on Emotion Analysis Through Music Analysis and Lyrics Analysis KCI 등재

텍스트 프롬프트 자동생성을 통한 음악 생성시스템 구현에 관한 연구: 음악 분석을 통한 감정 분석과 가사 분석을 중심으로

Min-Jin Kim, Jin-Wan Park

언어ENG
URLhttps://db.koreascholar.com/Article/Detail/437232

구독 기관 인증 시 무료 이용이 가능합니다. 4,000원

한국컴퓨터게임학회 논문지 (Journal of The Korean Society for Computer Game)

제37권 제3호 (2024.09)
pp.51-58

한국컴퓨터게임학회 (Korean Society for Computer Game)

초록

현재 존재하는 인공지능 기반 음악 생성에 관한 여러 모델과 연구는 수동 텍스트(Text) 기반 음악 생성에 대해 다루고 있다. 본 논문은 사용자의 편의성을 높이고, 창의적인 음악 생성 과정 을 더욱 원활하게 할 수 있도록 텍스트(TEXT) 프롬프트(Prompt) 자동화를 통한 음악 생성시 스템 방안을 제안한다. 그 방안으로 음원 파일을 통해 수집한 음악 분석 및 데이터화와 가사 정보에서 추출한 키워드를 기반한 장르, 가수, 앨범 등의 정보가 포함된 데이터셋(Dataset)을 구축 후, 파이썬(Python)의 자연어 처리 방법인 Konlpy를 사용하여 가사 데이터를 토큰화하고, TF-IDF(Term Frequency-Inverse Document Frequency) 벡터화를 통해 중요한 단어를 추 출한다. 또한, MFCC, 템포 등의 특징 데이터셋을 통하여 모델을 통한 감정을 예측하고, CNN 모델 및 Chatgpt를 활용한 텍스트 프롬프트를 자동생성하는 방법을 구현하여, MusicGen 모델 을 사용한 자동화 생성 프롬프트 기반 음악을 생성한다. 본 텍스트 프롬프트 자동 생성 화를 통한 음악 생성 연구의 결과는 음악 데이터 분석 및 생성 분야에 기여될 것으로 기대한다.

Current AI-based music generation models and research primarily focus on manual text-based music generation. This paper proposes a music generation system that automates text prompts to enhance user convenience and streamline the creative process. The study involves building a dataset that includes genre, artist, and album information by analyzing and processing music data collected from audio files and extracting keywords from lyrics. The lyrics data are tokenized using the KoNLPy natural language processing library in Python, and key terms are extracted through TF-IDF vectorization. Additionally, the study suggests a method for automatically generating text prompts using MFCC, tempo, and other feature data to predict emotions through a model that combines CNN and ChatGPT. These automatically generated text prompts are then input into the MusicGen model to automatically create new music that reflects the user's emotional state and musical preferences. The findings of this study are expected to contribute to the field of music data analysis and generation.

키워드

Music Generation Text Prompt Automation Emotion Analysis Deep Learning CNN LSTM Music Recommendation.

ABSTRACT
1. Introduction
    1.1 Background and Motivation
    1.2 Research Objective
2. Literature Review
    2.1 Music Analysis and Classification
    2.2 Lyrics Data Analysis and EmotionAnalysis
    2.3 Music Generation
3. Methodology
    3.1 Data Collection
    3.2 Data Preprocessing
    3.3 Audio Feature Extraction
    3.4 Emotion Prediction
    3.5 Prompt Generation
    3.6 Music Generation
    3.7 Evaluation of Generated Music
4. Results and Analysis
    4.1 Survey Results
    4.2 Overall Analysis
    4.3 Performance Evaluation of the EmotionPrediction Model
5. Conclusion and Future Research
    5.1 Conclusion
References
<국문초록>
<결론 및 향후 연구>

저자

Min-Jin Kim(Department of Technology Art, GSAIM, Chung-Ang University, Art Center, Seoul 06974, Korea) | 김민진
Jin-Wan Park(Department of Technology Art, GSAIM, Chung-Ang University, Art Center, Seoul 06974, Korea) | 박진완 Corresponding author

같은 권호 다른 논문