Even though performance-based language tests are often multivariate by design, previous validation studies have analyzed the dependability of ratings of such tests within univariate analytic frameworks. In an attempt to address this limitation, the present study investigated the dependability of judgements made based on ratings of a German speaking test using a multivariate generalizabilitiy theory (MGT) analysis. Data obtained from 88 students were analyzed in G- and D-studies of a two-facet crossed design. A D-study with two raters and two tasks supported the high dependability of the ratings and the placement decisions made at the pre-determined cut scores. An optimization analysis of the measurement procedure suggested that the desired level of dependability of ratings has already been achieved by the use of two raters and two tasks. The MGT analysis also generated other useful information about the convergent/ discriminant validity of the five subscales. Specifically, the universe score variance-covariance matrix obtained from the D-study showed that the underlying subscales were interrelated but distinct. Furthermore, the analysis of effective weights of the scales revealed that the Grammar subscale played the dominant role in the composite universe score variance.