This study investigated the feasibility of adopting an automatic scoring system (ASS) in a domestic English-speaking education context. Scope, test items, assessment criteria, scoring methods, and reporting strategies of six overseas English-speaking tests utilizing ASSs were examined. Moreover, a comparative analysis was conducted to identify disparities between ASS-based and non-ASS-based speaking tests. Findings were: 1) some ASS-based tests utilized ASS technology throughout the assessment, while others adopted a hybrid scoring system involving human raters; 2) compared to non-ASS-based tests, ASS-based tests used more test items targeting low-level skills such as sound and forms but fewer test items targeting conversation and discourse level skills; 3) pronunciation, fluency, and vocabulary were widely employed as evaluation criteria with sparse use of organization, content, and task completion in most ASS-based tests; 4) differences were minimal in assessment criteria application and score calculation between ASS-based and non-ASS-based tests; and 5) some ASS-based tests provided criteria-specific results and feedback with total scores and proficiency levels.
The present study investigated students’ preferences for the types of tasks used to assess English speaking performance. It further examined whether students’ task type preferences affected their perceptions of test effectiveness. One hundred eighty-two high school students responded to a self-report questionnaire. A series of frequency analysis and paired samples t-tests were used for the analysis. The results showed that students’ most preferred task types and their least preferred ones overlapped with each other, suggesting that the task types of English-speaking performance tests used in schools are limited. The four key reasons determining students’ task type preferences were identified, including task difficulty, emotional comfort, practical value, and interest. In addition, the results indicated that students’ task type preferences could affect their perceptions of task effectiveness. Overall, the results suggest the need for developing more varied task types for English-speaking performance tests as well as helping students become familiar with English speaking performance tasks. Pedagogical implications were discussed along with study limitations.
The purpose of this study was to investigate inter- and intra- rater reliability in an interview and a computerized oral test. It was also examined whether rater characteristics influenced on their reliability and biases, and finally the scores of both tests were compared with those of the Versant test using an automated computer rating system. For the study, the data from 21 Korean university students and 18 Korean or native speakers of English raters with various characteristics were collected. Some of the main findings from the study were as follows. First, rater severity was significantly different in each test, but each rater consistently graded on both tests suggesting lower inter-rater reliability and higher intra-rater reliability. Secondly, rater severity was impacted by the rater characteristics such as mother tongue, gender, age, and major. Lastly, there existed a positive correlation among the scores of the three tests, indicating that the scores of human beings and computers are strongly related.
This study investigates secondary school English teachers’ perceptions and psychological burdens involved in the implementation of the speaking and writing tests of the National English Ability Test, which is being developed by the Korean Ministry of Education, Science and Technology. The study surveyed 138 secondary school English teachers in Seoul. Although more than half of the teachers were aware of the new test, 18% of the surveyed teachers were not aware of the fact that speaking and writing skills would be assessed in the new test. Also, 22.7% of the teachers were opposed to the productive skills test. More than half (56.2%) of the teachers felt some psychological burdens toward the inclusion of the speaking/writing tests. Although the teachers admitted that serving as raters for the new test would help improve their teaching, the majority of them were reluctant to participate in the actual rating process. The teachers felt that the difficulty of subjective rating and the lack of time for the speaking and writing tests were serious problems in implementing the new test. The teachers were sensitive toward the students’ test anxiety. They also indicated that they feel a strong psychological burden when making judgments on the students' performances. Implications and suggestions are made based on the findings.
This study investigates the nature and the validity of the PhonePass SET-10 test designed to measure test-takers’ English oral proficiency using the automated computer technology. For this study, the data from 84 Korean college students were collected: students’ TEPS scores, PhonePass SET-10 results as well as the results from the survey developed to measure their attitudes toward the new format of English speaking test. Based on the analysis of the study, it is found out that there exists a positive correlation between the TEPS scores and the PhonePass SET-10 results of the participants, indicating that in fact the computer-based automated evaluating system can significantly contribute to assessing students’ English oral proficiency while the items and the purpose of the test administration are relatively limited. It is also reported that the participants of this study showed positive attitudes toward the PhonePass SET-10 with a hope that the newly developed speaking test will help them further their English study.
The Sociolinguistic Journal of Korea, 13(1). The purpose of this study is to explore the possibility of interface research between English discourse studies and speaking assessment. Assessment expertise has already been interfaced with discourse-based studies to develop and validate oral proficiency tests of English. One of the important developments in language assessment over recent years is the introduction of qualitative discourse research methodologies to design, describe, and validate direct or semi-direct tests of speaking. This paper explored assessment research in the areas of the following discourse studies: interactional sociolinguistics, ethnography of communication, variation analysis, conversation analysis, critical discourse analysis. This review of discourse-related assessment literature showed that discourse studies have much potential for the validation tasks of current oral proficiency tests. Possible areas of discourse studies related to the new era of process-oriented language assessment were discussed.