When two raters assign discrepant ratings in writing assessments, a method of resolving the differences should be applied to improve the accuracy of scores reported to examinees (operational scores). The present study, in an attempt to locate the most effective resolution method for the current context where not only experienced but also novice raters participate in assessment, examined tertium quid method as compared to averaged original scores of experienced and novice raters. The results of paired t-tests and interrater reliability estimates for seven pairs of raters showed that the ratings of experienced and novice raters were significantly different from each other. To investigate the accuracy of averaged and tertium quid scores, Pearson correlation was conducted by correlating the two resolved scores with criterion score (standard for decision making). Tertium quid scores correlated much higher with the criterion than the averaged scores across seven sets of scores for all six categories (Content, Organization, Style and Quality of Expression, Language Use, Mechanics, and Fluency), thereby providing a positive evidence for tertium quid method as score resolution. The study suggests for future research that it is not sufficient enough to test the efficacy of resolution methods without considering sources of rater variability.