RNN을 이용한 Expressive Talking Head from Speech의 합성

사쿠라이 류헤이; 심바 타이키; 야마조에 히로타케; 이주호

논문 상세보기

RNN을 이용한 Expressive Talking Head from Speech의 합성 KCI 등재

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network

사쿠라이 류헤이, 심바 타이키, 야마조에 히로타케, 이주호

언어ENG
URLhttps://db.koreascholar.com/Article/Detail/342186

서비스가 종료되어 열람이 제한될 수 있습니다.

로봇학회논문지 (The Journal of Korea Robotics Society)

제13권 제1호 (통권 제47호) (2018.02)
pp.16-25

한국로봇학회 (Korea Robotics Society)

초록

The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.

키워드

Talking heads Recurrent neural network Acoustic features Facial features

Abstract
1. Introduction
  1.1 Related Researches
  1.2 Research Aim
2. Proposed method
  2.1 Dataset
  2.2 Sequential regression by RNN
3. Feature representation and extraction
  3.1 Audio feature
  3.2 Facial code
4. Experiments
  4.1 Details of experimental setup
  4.2 Details of networks and training
  4.3 Quantitative evaluation of the prediction models
  4.4 Qualitative evaluation for the synthesized talking heads
  4.5 Discussion
5. Conclusion
References

저자

사쿠라이 류헤이(Ritsumeikan University, Shiga, Japan) | Ryuhei Sakurai
심바 타이키(Ritsumeikan University, Shiga, Japan) | Taiki Shimba
야마조에 히로타케(Ritsumeikan University, Shiga, Japan) | Hirotake Yamazoe
이주호(Ritsumeikan University, Shiga, Japan) | Joo-Ho Lee Corresponding author

같은 권호 다른 논문