Plausibility of reflecting rater severity and measuring fluency based on ASR-based oral proficiency testing
Serious inherent problems with practicality, intra-rater and inter-rater reliability overshadow the known positive washback effects of performance assessment in language education. In particular, it has been welldocumented that inter-rater reliability poses a serious threat to overall test validity, since individual raters necessarily measure performance according to their own subjective severity criteria in language proficiency. However, language testing has witnessed a remarkable series of breakthroughs in performance assessment during the recent advent of the information era. One such breakthrough utilizes state-of-the-art automatic speech recognition (ASR) technology for oral proficiency interviews(OPI). Granting that current forms of ASR technologies may not produce results with the reliability needed to accommodate highstakes standardized test administration, they do offer aid in approaching the thorny issues of practicality and inherent human inter-rater subjectivity. Accordingly, this paper is intended to investigate the degree to which ASR-based OPI ratings match similar human-conducted OPI ratings by employing correlational analyses on the basis of degrees of rater severity. Furthermore, this paper attempts to explore a method of enhancing the robustness of ASR-based OPI ratings which capitalizes on suprasegmental information by measuring fluency based principally on the test-takers’ response time length.