검색결과

검색조건

좁혀보기

검색필터 CLOSE

검색결과 1건

2018.02 KCI 등재 서비스 종료(열람 제한)

RNN을 이용한 Expressive Talking Head from Speech의 합성

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network

사쿠라이 류헤이, 심바 타이키, 야마조에 히로타케, 이주호

로봇학회논문지 제13권 제1호 (통권 제47호) pp.16-25 한국로봇학회

The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.