        본 연구의 목적은 집단논리적사고력검사의 이용 목적에 따라 일반화가능도이론을 적용하여 문항과 피험자만을 고려한 단일국면의 오차원, 그리고 문항과 피험자, 그리고 영역을 고려한 다국면의 오차원을 분석하는 데 있다. 연구는 지방 소재 초 중 고등학생 총 1016명을 대상으로 이루어졌으며, 21문항의 GALT 완본을 40분 동안 실시하고, 이 중 축소본에 해당하는 12문항을 별도로 추출하여 일반화가능도이론을 이용한 신뢰도 분석에 이용하였다. 자료의 분석을 위해 일반화가능도이론을 적용하여 p×i설계와 p×(i:h)설계로 나누어 G 연구와 D 연구를 실시하였다. 분석결과는 다음과 같다. 첫째, 완본과 축소본을 p×I설계로 D 연구를 수행한 결과 완본의 경우 21문항을 평가했을 때 0.87로 적정 수준의 일반화가능도 계수인 0.80을 상회하였으며, 13문항에서도 적정 수준의 일반화가능도 계수에 도달하였다. 축소본의 경우 12문항을 평가했을 때 0.77로 적정 수준의 일반화가능도 계수에 미치지 못하였으며, 최소 15문항 이상에서 신뢰도가 적정 수준에 도달하였다. 둘째, 축소본을 p×(I:H)설계로 D 연구를 수행한 결과 6영역에 대해 영역별로 2문항씩 구성될 경우 0.71로 적정 수준의 일반화가능도계수인 0.80 보다 낮게 측정되었으며, 최소 영역별 5문항 이상에서 신뢰도가 적정 수준에 도달하였다.
        Because performance assessment such as a composition test introduces a range of factors that may influence the chances of success for a candidate on the test, those in charge of monitoring quality control for performance assessment programs need to gather information that will help them determine whether all aspects of the programs are working as intended. In the present study, generalizability theory (Brennan, 1992) was employed to examine the relative effects of various sources of variability on students" performance on an essay writing test and also to investigate the reliability of the assigned scores. The results showed that due to the largest effect associated with the students" writing ability and negligible effects associated with facets of measurement such as scoring criteria and ratings, the generalizability coefficient estimated for the writing test was high, suggesting that the test is a reliable measure of what it purports to measure, the students" writing ability.
        Even though performance-based language tests are often multivariate by design, previous validation studies have analyzed the dependability of ratings of such tests within univariate analytic frameworks. In an attempt to address this limitation, the present study investigated the dependability of judgements made based on ratings of a German speaking test using a multivariate generalizabilitiy theory (MGT) analysis. Data obtained from 88 students were analyzed in G- and D-studies of a two-facet crossed design. A D-study with two raters and two tasks supported the high dependability of the ratings and the placement decisions made at the pre-determined cut scores. An optimization analysis of the measurement procedure suggested that the desired level of dependability of ratings has already been achieved by the use of two raters and two tasks. The MGT analysis also generated other useful information about the convergent/ discriminant validity of the five subscales. Specifically, the universe score variance-covariance matrix obtained from the D-study showed that the underlying subscales were interrelated but distinct. Furthermore, the analysis of effective weights of the scales revealed that the Grammar subscale played the dominant role in the composite universe score variance.