The authors assessed correct model identification rates of Akaike's information criterion (AIC), corrected criterion (AICC), consistent AIC (CAIC), Hannon and Quinn's information criterion (HQIC), and Bayesian information criterion (BIC) for selecting among cross-classified random effects models. Performance of default values for the 5 indices used by SAS PROC MIXED for estimating a 2-level cross-classified random effects model were compared with modifications to the sample size used in the AICC, CAIC, HQIC, and BIC formulations. The sample sizes explored included the number of level 1 units (N), the average number of classification units (m), and the number of nonempty classification cells (c). The authors also assessed performance of the χ2diff test for testing the difference in fit between 2 nested cross-classified random effects models. The χ2diff exhibited a slightly inflated Type I error rate with high power. The modified information criteria performed better than did the default values. Pairing of N with the HQIC, BIC, and CAIC and of m with the AICC worked best. Results and suggestions for future research are discussed.
Keywords: cross-classified model; information criteria; model fit indices; multilevel model
ANALYSES OF LARGE-SCALE EDUCATIONAL DATASETS are frequently complicated by the clustering of individuals within common contexts. For example, in cross-sectional data, a dataset might consist of students clustered within classrooms or schools. Longitudinal data consists of repeated measures on (and thus clustered within) individuals who might be further clustered by schools. This clustering leads to violation of the assumption of independence that is made when estimating a multiple regression model using, for example, ordinary least squares estimation. Violating this assumption can lead to spurious inferences being made on the basis of tests of parameter estimates with concomitant inflated type I error rates (Hox, [
However, use of the conventional multilevel model only works for purely clustered data. Figure 1 contains a network graph depicting the pure clustering of students (level 1) within middle schools (level 2) with multiple middle schools feeding into each high school (level 3). The pureness of the clustering is evidenced in the figure by the fact that lines do not cross that connect students to their middle and high schools. The three levels of the hierarchy are evident because for each high school, there is a closed set of middle schools that feed solely into that high school. For example, any student in the dataset who attended middle schools MS1, MS2, or MS3 went on to attend HSI (see Figure 1). Similarly, any student in the dataset who attended middle schools MS4 or MS5 then attended HSII.
Graph: FIGURE 1 Network graph depicting pure clustering of students within MSs and of MSs within HSs. HS = high school; MS = middle school; STU = student.
Real-world data with two clustering variables (here, middle and high school) do not always fit into this pure kind of hierarchy. Instead, a dataset might entail what is termed a cross-classified structure, wherein one of the two higher level clustering units is not purely clustered within the other. In the context of the present example, the dataset consists of multiple students per (i.e., clustered within each) middle school and of multiple students per high school, however middle schools might not be purely clustered within high schools (contrary to what is depicted in Figure 1). Similarly, high schools might not be purely clustered within middle schools. In other words, there might not be a closed set of middle schools affiliated with each high school (nor vice-versa). Thus, students (level 1) are cross-classified by middle school and by high school, and thus, each clustering unit can be considered a level 2 cross-classification factor.
Figures 2 and 3 depict the same cross-classified dataset in which students are clustered within middle schools and students are also clustered within high schools but neither clustering unit is clustered within the other. Figures 2 and 3 differ solely in that the students are presented in different orders. In Figure 2, students are aligned by middle school. In Figure 3, students are aligned by high school. The cross-classification is evidenced in these figures by the crossing of the lines connecting students with one of the two higher level clustering units (see Beretvas, [
Graph: FIGURE 2 Network graph depicting cross-classification of students by MSs and by HSs with students lined up within MSs. HS = high school; MS = middle school; STU = student.
Graph: FIGURE 3 Network graph depicting cross-classification of students by HSs and by MSs with students lined up within HSs. HS = high school; MS = middle school; STU = student.
There are several possible examples of cross-classified data in educational research. This includes the example already given (middle and high schools) as well as the classic example of school and neighborhood (Garner & Raudenbush, [
Recent methodological studies (e.g., Luo & Kwok, [
The next few sections review the formulations of possible information criteria for use with the conventional multilevel model as well as summarizing research on how well the information criteria support the fit of the correct multilevel model. This discussion addresses sample size modifications to the information criteria that have been suggested and assessed in studies by Gurka ([
Multilevel modelers do not frequently use fit indices to assess or compare the fit of multilevel models and instead tend to use statistical significance test results for each parameter estimate of interest under the assumption that the model fits. Whittaker and Furlow ([
Although they are not commonly used by multilevel modeling researchers, fit indices are automatically produced in the output of commonly used statistical modeling software programs (e.g., SAS, MLwiN, Mplus, HLM, LISREL). Fit indices could prove useful given the questionable use (Goldstein, [
When comparing two nested multilevel models, researchers can use the chi-square difference test, χ
Graph
where LL
Calculation of the log likelihood depends on the estimation procedure used. If the two models being compared differ only in their random effects' specification, then the deviance difference in Equation 1 can be tested using restricted (also known as residual) maximum likelihood (REML) estimates of the LLs (Verbeke & Molenberghs, [
Information criteria can also be used to compare the fit of nonnested models. However, when comparing the fit of two nonnested linear mixed models, the χ
Information criteria are not statistics per se but are indices whose values can be compared. In general, and depending on the formulation (see Gurka, [
Akaike's information criterion (AIC; Akaike, [
Graph
where q represents the number of random and fixed effects parameters estimated in the model (when FIML estimation is used). Hurvich and Tsai ([
Graph
The Bayesian information criterion (BIC; Schwarz, [
Graph
Another consistent criterion is Bozdogan's consistent AIC (CAIC; Bozdogan, [
Graph
Last, SAS software includes Hannon and Quinn's information criterion (HQIC; Hannon & Quinn, [
Graph
Some methodological research has been conducted that has compared identification of the correct multilevel model using information criteria (e.g., Gurka, [
In Gurka's study (2006), he compared correct model identification using the AIC, AICC, BIC, and CAIC paired with N and m and FIML estimation. In addition, correct model identification rates for the four information criteria were also compared when REML estimation was used with two different restricted log-likelihood functions and with the information criteria paired with N−q and m. Gurka compared the fit of three sets of three different models (the correctly specified, underparameterized and overparameterized models). The sets were characterized by differences in terms of their fixed effects, random effects, and a combination of both. Gurka manipulated level 1 and level 2 variance values to provide conditions with different intraclass correlation (ρ
When comparing models that differed only in their fixed-effects, Gurka ([
Across information criteria, Gurka ([
Whittaker and Furlow's (2009) study extended Gurka's (2006) investigation in a number of ways. First, the authors included an assessment of the performance of the HQIC criterion. In addition, the authors evaluated the performance of the five information criteria (see Equations 2 through 6) with some models that were more complex than those used by Gurka (including models with random slopes and cross-level interactions). Last, Whittaker and Furlow (2009) also used multilevel modeling to model clustered data rather than repeated measures data, with larger within-level-two sample sizes. As did Gurka ([
Overall, Whittaker and Furlow (2009) found that the BIC and CAIC indices worked best in the scenarios that they evaluated. However, Whittaker and Furlow found that BIC
As did Gurka ([
In summary, results found in Gurka ([
As noted earlier, use of the conventional multilevel models permits modeling of pure clusters of data such as for the clustering of students (level 1) within middle schools (level 2) within high schools (level three). However, use of this three-level model would only work if middle schools (MSs) were purely clustered within high schools (HSs) or vice versa. In practice it is more likely that neither clustering variable is perfectly nested within the other. For scenarios in which there are two clustering variables (here, MS and HS) that are not clustered within each other, a cross-classified data structure is inferred. In the current example, students are cross-classified by level 2 classification factors (namely, MS and HS). The dependency resulting from the clustering of students within each of the two classifications must still be appropriately handled using the CCREM rather than a conventional multilevel model. Use of the CCREM permits partitioning of the variability in the outcome of interest into the portion attributable to the individual student and to the effects of each of the cross-classification factors (here, MS and HS).
A brief review of the two-level CCREM is provided here although the reader is encouraged to review more detailed descriptions published elsewhere (e.g., see Hox, [
As with conventional multilevel models, an unconditional model is formulated that contains no predictors. Adopting Raudenbush and Bryk's (2002) levels' formulation, at level 1, the outcome for student i, who attended MS j
Graph
where it is assumed that the level 1 residual, . At level 2, the unconditional CCREM would be the following:
Graph
where is the residual for MS j
Graph
Equation 3 clearly demonstrates the partitioning of the variability in the outcome, , into the portions attributable to variability across students (), MSs () and HSs (). In conventional multilevel modeling, the intraclass correlation, ρ
Graph
The ρ
The conditional CCREM includes predictors at any of the levels. For example, a level 1 predictor, X, could be added to the model such that Equation 7 becomes:
Graph
and the level 2 (Equation 8) becomes:
Graph
if the effect of X is assumed to be fixed across MSs and HSs (i.e., Equation 12 includes no middle nor high school residuals in the formulation of the coefficient for X, ). Equation 12 could be extended to include MS, M, and HS, H, predictors should there be substantial variability in the intercept term (i.e., if > 0 and > 0):
Graph
where the γ
Alternatively, the effect of X could be modeled as randomly varying across both MSs and HSs through the addition of the and residuals to the formula for as follows:
Graph
The M and H predictors could also be added to the random-slopes model in Equation 14 to explain variability in, say, the intercept while modeling variability in the slope of X across MSs and HSs resulting in the following level 2 model:
Graph
where each classification's pair of residuals are assumed normally distributed with means of zero and
Graph
Additional combinations of predictors and random effects, and even levels, can be modeled although these are not detailed further here. Parameters (and associated standard errors) of the CCREM can be estimated using several different multilevel modeling software programs (including MLwiN and HLM) as well as more general statistical software programs (SAS, R, and STATA).
As part of a larger study, Meyers and Beretvas (2006) assessed the performance of the BIC and AIC for data generated to fit a CCREM. The authors estimated the correct CCREM and then a conventional multilevel model that ignored one of the classification factors and was thus underparameterized. Meyers and Beretvas ([
Meyers and Beretvas ([
When SAS PROC MIXED is used to estimate CCREMs[
Therefore, the present study assessed use of the AIC, AICC, BIC, CAIC, and HQIC for correct CCREM model identifications. Each information criteria was assessed using its default formulation in SAS and included (if different) its combination with N, m, and c as values for N*. Model fit of three CCREMs were compared including the correctly specified CCREM, an underparameterized CCREM (excluding one truly nonzero fixed effect) and an overparameterized CCREM (including an additional truly zero fixed effect). We conducted these three-model comparisons for two true models. One true model included a fixed level 1 predictor and another included a level 1 predictor that varied randomly across both level 2 classifications. Several design conditions were manipulated that contribute to the degree of clustering, the sample size and the degree of cross-classification.
We conducted a simulation study to assess correct CCREM model identification of the AIC, BIC, HQIC, CAIC, and AICC. The AICC, HQIC, BIC, and CAIC were each paired with three alternative values for N*. The N* alternatives included the total sample size, N, the maximum number of units of the cross-classification factors, m, and the number of nonempty cross-classification cells, c. The AIC is not a function of N, so it was not modified to include the different values of N*. In addition, we also assessed the default values (where N* = 1) used in SAS PROC MIXED for the CAIC, HQIC, and BIC. The N* used with each information criteria is identified here by using a subscript representing the N*. Thus, we assessed the performance of the following 16 information criteria: AIC, AICC
We assessed performance of the information criteria and χ
The level 1 equation used for the FixedX and RandomX models matches that in Equation 11 that included a single level 1 predictor, X. The level 2 equation for the FixedX generating model included a MS predictor, M, in the equation for the intercept as well as residual variability across middle and high schools and was as follows:
Graph
The same intercept, , equation as in Equation 16 was used for the RandomX generating model, however, the equation for the level 1 predictor's slope coefficient, , included residual terms for both MSs and HSs (i.e., randomly varied across both classification factors):
Graph
Parameter values used when generating FixedX and RandomX data included γ
Fit of the generated datasets was compared across three estimating models. The three estimating models included the correct model (Equation 16 or 17, depending on the condition), an underparameterized model and an overparameterized model. The underparameterized model excluded one parameter as compared with the relevant correct model; the overparameterized model included one unnecessary parameter. More specifically, the underparameterized model did not include the MS predictor, M, and the overparameterized model included a fixed-effect coefficient for HS predictor, H, of the intercept, with a true value of zero. Thus, for the FixedX-generated data, the models in Equations 12, 16, and 13 corresponded do the underparameterized, correct and overparameterized models, respectively. For the RandomX-generated data, the relevant underparameterized, correct, and overparameterized models appear in Equations 14, 17, and 15, respectively.
These very basic models were chosen as a simple starting point for this assessment of the performance of the various information criteria. In addition, although information criteria are typically used when comparing nonnested models, they can also be used to choose between nested models. The use of the three nested models in the present study permitted assessment of the performance of the χ
We used SAS to generate data, estimate models (SAS PROC MIXED) and to summarize results. Given the models being compared differed only in their fixed effects' specification, FIML estimation should be and was used. Correct model identification rates were compared across the three estimating models for both the FixedX and RandomX scenarios. The following conditions were manipulated: the per-classification intercept ρ
Table 1 lists design conditions and their values. We subsequently provide further details.
TABLE 1 Simulation Study Conditions
Intercept intraunit Number and degree of correlation cross-classifications Sample size coefficient First Second Third Fourth Number of cross- Average Middle High high high high high classified per middle Condition school school school school school school factor units school, 1 0.15 0.15 64% 36% — — 50 25 2 0.15 0.30 64% 36% — — 50 25 3 0.30 0.15 64% 36% — — 50 25 4 0.15 0.15 50% 50% — — 50 25 5 0.15 0.30 50% 50% — — 50 25 6 0.30 0.15 50% 50% — — 50 25 7 0.15 0.15 64% 36% — — 25 50 8 0.15 0.30 64% 36% — — 25 50 9 0.30 0.15 64% 36% — — 25 50 10 0.15 0.15 50% 50% — — 25 50 11 0.15 0.30 50% 50% — — 25 50 12 0.30 0.15 50% 50% — — 25 50 13 0.15 0.15 64% 12% 12% 12% 50 25 14 0.15 0.30 64% 12% 12% 12% 50 25 15 0.30 0.15 64% 12% 12% 12% 50 25 16 0.15 0.15 25% 25% 25% 25% 50 25 17 0.15 0.30 25% 25% 25% 25% 50 25 18 0.30 0.15 25% 25% 25% 25% 50 25 19 0.15 0.15 64% 12% 12% 12% 25 50 20 0.15 0.30 64% 12% 12% 12% 25 50 21 0.30 0.15 64% 12% 12% 12% 25 50 22 0.15 0.15 25% 25% 25% 25% 25 50 23 0.15 0.30 25% 25% 25% 25% 25 50 24 0.30 0.15 25% 25% 25% 25% 25 50 25 0.15 0.15 64% 36% — — 50 50 26 0.15 0.30 64% 36% — — 50 50 27 0.30 0.15 64% 36% — — 50 50 28 0.15 0.15 50% 50% — — 50 50 29 0.15 0.30 50% 50% — — 50 50 30 0.30 0.15 50% 50% — — 50 50 31 0.15 0.15 64% 36% — — 25 25 32 0.15 0.30 64% 36% — — 25 25 33 0.30 0.15 64% 36% — — 25 25 34 0.15 0.15 50% 50% — — 25 25 35 0.15 0.30 50% 50% — — 25 25 36 0.30 0.15 50% 50% — — 25 25 37 0.15 0.15 64% 12% 12% 12% 50 50 38 0.15 0.30 64% 12% 12% 12% 50 50 39 0.30 0.15 64% 12% 12% 12% 50 50 40 0.15 0.15 25% 25% 25% 25% 50 50 41 0.15 0.30 25% 25% 25% 25% 50 50 42 0.30 0.15 25% 25% 25% 25% 50 50 43 0.15 0.15 64% 12% 12% 12% 25 25 44 0.15 0.30 64% 12% 12% 12% 25 25 45 0.30 0.15 64% 12% 12% 12% 25 25 46 0.15 0.15 25% 25% 25% 25% 25 25 47 0.15 0.30 25% 25% 25% 25% 25 25 48 0.30 0.15 25% 25% 25% 25% 25 25
In a summary of four studies that assessed reasonable values for two-level models' ρ
The degree of cross-classification was operationalized using two factors, including the number of HSs attended by students at each MS and the distribution of the middle school students across the HSs. Luo and Kwok ([
TABLE 2 Cross-Classification Pattern Simulated in Two High Schools Conditions
High school Middle school HS1 HS2 HS3 HS4 HS5 ... HS HS MS1 × × MS2 × × MS3 × × MS4 × × MS5 × ... ... ... ... ... ... ... ... ... ... MS ... × × MS × ×
TABLE 3 Cross-Classification Pattern Simulated in Four High Schools Conditions
High school Middle school HS1 HS2 HS3 HS4 HS5 ... HS HS MS1 × × × × MS2 × × × × MS3 × × × × MS4 × × × × MS5 × × × ... ... ... ... ... ... ... ... ... ... MS ... × × MS × ... × ×
Two patterns of distributions of MSs feeding into each HS were investigated including a balanced and an unbalanced distribution. The distributional pattern was reflected in the percent of students attending each of the affiliated HSs (with unbalanced and balanced distributions operationalized respectively as 64%:36% and 50%:50% for the two cross-classifications conditions, and 64%:12%:12%:12% versus 25%:25%:25%:25% for the four cross-classifications conditions).
Two values (m = 25 and m = 50) were used for the number of units per cross-classification (i.e., the number of MSs and the number of HSs). The value of m was generated to be the same for both MSs and HSs in the present study. Each value of m was paired with two values (25 and 50) for the average total number of middle school students per MS, . Meyers and Beretvas ([
Correct model identification rates were tallied across replications for each combination of conditions for the two generating models and for each of the 16 information criteria being investigated. Correct model identification rates were also gathered for FixedX and RandomX scenarios using the χ
Table 4 contains the proportion of converged solutions (out of the 500 replications for each of the 48 combinations of conditions). The proportion of converged solutions is presented for each of the three models estimated for FixedX and for RandomX datasets. In addition, the overall proportion of replications for which converged solutions were obtained for across three models estimated is also presented in the last two columns of Table 4 for FixedX and RandomX datasets.
TABLE 4 Proportion of Converged Solutions (Out of 500), by Condition, Generating Model, and Estimating Model
Proportion of converged solutions by generating and estimating models Overall FixedX RandomX proportion, by generating model Correctly Correctly Condition Underparameterized specified Overparameterized Underparameterized specified Overparameterized FixedX RandomX 1 97.8% 78.6% 78.8% 95.0% 76.8% 76.8% 76.2% 73.6% 2 100.0% 75.2% 75.0% 97.4% 74.6% 75.0% 74.8% 73.6% 3 99.6% 98.8% 98.8% 96.8% 95.8% 95.8% 98.6% 95.0% 4 98.4% 80.2% 79.8% 97.0% 78.8% 78.8% 78.4% 77.0% 5 100.0% 76.8% 77.8% 96.4% 74.0% 74.0% 76.8% 73.6% 6 100.0% 99.4% 99.2% 97.2% 97.2% 97.2% 99.2% 96.8% 7 99.0% 82.4% 82.0% 97.2% 83.2% 82.8% 80.6% 81.0% 8 100.0% 87.8% 88.2% 96.8% 85.4% 85.0% 87.6% 84.2% 9 100.0% 100.0% 99.8% 98.0% 97.8% 97.8% 99.8% 97.8% 10 98.0% 85.0% 84.6% 97.4% 84.2% 83.8% 82.8% 82.4% 11 100.0% 90.2% 90.6% 98.4% 86.6% 86.6% 90.2% 86.4% 12 99.4% 99.6% 99.4% 97.8% 97.8% 97.6% 99.4% 97.6% 13 99.0% 83.0% 82.8% 96.2% 78.6% 78.6% 81.4% 76.0% 14 100.0% 80.8% 80.4% 96.6% 77.6% 78.2% 80.0% 76.4% 15 99.8% 100.0% 100.0% 98.0% 97.8% 97.8% 99.8% 97.6% 16 99.6% 90.2% 90.4% 96.4% 86.4% 86.2% 89.6% 84.6% 17 100.0% 91.2% 91.2% 99.2% 89.0% 89.0% 91.2% 88.6% 18 100.0% 100.0% 100.0% 98.8% 98.8% 98.6% 100.0% 98.6% 19 99.0% 90.4% 91.0% 98.6% 88.4% 87.8% 89.2% 87.2% 20 100.0% 93.2% 93.2% 97.0% 92.6% 93.0% 93.0% 91.8% 21 100.0% 100.0% 100.0% 97.6% 98.0% 98.0% 100.0% 97.6% 22 100.0% 95.8% 95.2% 99.0% 95.0% 94.2% 95.2% 93.8% 23 100.0% 98.8% 98.8% 99.6% 99.0% 99.0% 98.8% 98.8% 24 100.0% 100.0% 100.0% 99.0% 99.0% 99.0% 100.0% 99.0% 25 100.0% 83.8% 84.0% 99.4% 85.8% 86.0% 83.2% 84.8% 26 100.0% 88.6% 88.8% 100.0% 87.8% 88.2% 88.6% 87.8% 27 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 28 100.0% 87.4% 87.4% 100.0% 87.8% 88.2% 86.6% 87.8% 29 100.0% 90.0% 90.0% 100.0% 90.6% 90.6% 90.0% 90.4% 30 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 31 92.8% 72.8% 70.8% 75.6% 62.0% 61.6% 66.4% 55.4% 32 99.8% 73.8% 74.2% 84.0% 67.0% 66.2% 73.0% 64.0% 33 96.6% 96.8% 95.8% 85.2% 83.2% 82.4% 95.0% 81.4% 34 93.6% 67.8% 65.6% 79.2% 63.8% 61.4% 62.4% 55.4% 35 100.0% 78.8% 78.6% 85.4% 64.4% 64.2% 78.0% 62.2% 36 97.6% 97.4% 95.8% 85.0% 85.6% 84.2% 95.2% 83.0% 37 100.0% 92.2% 92.0% 100.0% 92.6% 92.4% 92.0% 92.4% 38 100.0% 92.8% 92.8% 100.0% 92.6% 92.8% 92.8% 92.6% 39 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 40 100.0% 97.4% 97.4% 100.0% 98.0% 97.8% 97.2% 97.8% 41 100.0% 99.8% 99.8% 100.0% 97.6% 97.6% 99.8% 97.6% 42 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 43 92.6% 75.2% 74.2% 78.8% 65.4% 64.6% 70.0% 58.0% 44 100.0% 80.8% 80.6% 87.0% 70.2% 70.2% 80.2% 68.4% 45 97.2% 97.8% 97.0% 85.2% 84.6% 83.8% 96.0% 82.6% 46 97.2% 86.0% 84.6% 89.0% 76.6% 76.6% 83.6% 73.0% 47 100.0% 87.8% 87.8% 91.6% 79.0% 78.4% 87.4% 77.6% 48 99.2% 99.4% 99.4% 88.8% 89.2% 88.8% 99.2% 87.6%
The pattern of convergence rates was very similar across generating models. Convergence rates were consistently better for the FixedX generating model's conditions than the RandomX conditions in which two additional random effects' variances were generated and estimated. Substantially higher convergence rates were found for the underparameterized models versus the correct and overparameterized models. Insubstantial differences in convergence rates were found between the correct and overparameterized models estimated for both FixedX and RandomX datasets. The mean overall convergence rates across conditions for FixedX and RandomX data for the underparameterized model were 99.1% and 94.9%, respectively. The corresponding rates for the correct and overparameterized model were 90.1% and 89.9% for FixedX data and 86.6% and 86.4% for RandomX data, respectively.
Although some mention will be made of differences found across models, the focus in the remaining presentation of convergence rates will be on the results found in the last two columns of Table 4 summarizing rates when solutions converged for all three models estimated per replication. These results closely match those found for the correct and overparameterized models for both FixedX and RandomX data. Differences across conditions were not especially distinct when the underparameterized model was estimated.
Several factors seemed to have had particularly strong effects on convergence rates: the sample size and the value and pattern of intercept ρ
Convergence rates were also found to be a function of the intercept ρ
The cross-classification distribution and degree also affected convergence rates. Convergence was found to be better when there were more cross-classifications than when there were fewer. In other words, in conditions where each HS was fed by two MSs (see Table 2), the convergence rates were substantially lower than when each HS was fed by four MSs (see Table 3). In addition, the more balanced distributions (50%:50% versus 64%:36%, and 25%:25%:25%:25% versus 64%:12%:12%:12% for the two and four HS conditions, respectively) led to better convergence rates. However, the effect of the number of cross-classifications appeared to be stronger than the effect of the distribution of cross-classifications.
Fit of three (correct, under- and overparameterized) models was compared simultaneously for each dataset. Tables 5 and 6 contain the proportions of correct model identifications for 14 of the 16 information criteria for the FixedX and RandomX data, respectively. The pattern of results for FixedX and for RandomX data was very similar. Results for the default HQIC and BIC values used by SAS (here, termed HQIC
TABLE 5 Correct Model Identification Rates Using Different N*s Paired With Each Information Criterion for the FixedX Generating Model, by Condition
Version of AICC Version of HQIC Version of BIC Version of CAIC Condition AIC AICC AICC AICC HQIC HQIC HQIC BIC BIC BIC CAIC1 CAIC CAIC CAIC 1 82.94% 83.20% 90.03% 86.09% 94.23% 90.03% 90.55% 98.69% 94.23% 95.80% 65.62% 99.21% 96.33% 97.11% 2 81.02% 81.82% 89.04% 85.56% 94.92% 89.04% 90.64% 99.20% 94.92% 96.52% 64.17% 99.47% 96.79% 97.86% 3 78.91% 79.11% 88.03% 83.98% 94.52% 88.03% 90.26% 98.99% 94.52% 96.15% 62.27% 99.19% 96.96% 97.97% 4 82.14% 82.14% 88.27% 85.71% 93.62% 88.27% 90.31% 98.98% 93.62% 94.64% 67.60% 99.24% 95.92% 97.19% 5 82.29% 82.29% 91.41% 88.02% 95.57% 91.41% 93.49% 99.48% 95.57% 97.14% 63.80% 99.48% 97.40% 97.92% 6 80.85% 81.05% 88.31% 84.48% 92.54% 88.31% 89.32% 98.59% 92.54% 95.57% 64.32% 99.19% 96.57% 97.18% 7 81.14% 81.89% 92.80% 87.35% 92.80% 84.37% 87.35% 98.02% 90.57% 92.80% 62.28% 98.76% 94.05% 95.78% 8 82.88% 83.11% 94.98% 91.10% 94.98% 86.30% 91.10% 100.00% 92.92% 94.98% 66.44% 100.00% 95.43% 98.63% 9 81.16% 81.76% 93.39% 87.17% 92.39% 84.17% 87.17% 88.38% 89.98% 92.39% 63.53% 85.57% 92.18% 92.79% 10 81.88% 82.13% 93.48% 87.92% 93.48% 85.02% 87.92% 99.52% 90.58% 93.48% 68.12% 99.52% 95.41% 97.59% 11 82.04% 82.26% 94.90% 87.14% 94.90% 85.14% 87.14% 98.89% 92.02% 94.90% 68.51% 99.11% 95.57% 96.90% 12 79.68% 79.88% 91.95% 87.12% 91.55% 83.30% 86.92% 91.35% 88.73% 91.55% 66.00% 88.73% 91.15% 93.56% 13 82.06% 82.31% 89.93% 84.03% 95.09% 89.93% 93.37% 98.53% 95.09% 97.79% 66.09% 99.26% 97.54% 98.03% 14 83.00% 83.00% 89.00% 84.25% 94.00% 89.00% 91.75% 99.25% 94.25% 96.25% 66.50% 99.50% 96.00% 98.00% 15 82.16% 82.77% 89.38% 84.17% 94.79% 89.38% 91.98% 99.00% 94.99% 97.60% 66.93% 99.20% 96.99% 98.40% 16 78.80% 79.69% 86.61% 82.14% 94.20% 86.61% 90.40% 97.99% 94.20% 96.88% 62.28% 98.66% 96.43% 97.99% 17 83.11% 83.11% 88.60% 84.87% 95.40% 88.60% 92.76% 98.68% 95.40% 97.59% 64.69% 98.90% 96.93% 98.47% 18 83.00% 83.60% 89.20% 84.80% 93.60% 89.20% 91.40% 99.00% 93.60% 97.20% 69.80% 99.40% 97.00% 98.60% 19 81.39% 81.61% 94.17% 85.43% 94.17% 85.43% 91.48% 98.21% 92.60% 95.96% 62.56% 99.10% 94.40% 97.53% 20 86.02% 86.02% 94.19% 88.39% 94.19% 88.39% 91.18% 99.14% 92.69% 95.91% 70.54% 99.36% 94.84% 98.07% 21 79.80% 80.20% 92.00% 83.40% 91.20% 83.40% 89.20% 90.40% 89.80% 92.20% 64.00% 88.80% 91.60% 91.80% 22 82.77% 82.98% 94.96% 85.71% 94.96% 85.71% 91.18% 98.53% 92.44% 96.43% 65.55% 99.16% 95.38% 97.90% 23 80.57% 81.17% 93.73% 85.83% 93.73% 85.83% 89.68% 98.99% 91.50% 95.95% 62.35% 99.60% 94.33% 97.57% 24 79.60% 79.60% 92.40% 82.60% 91.40% 82.60% 88.20% 91.00% 89.40% 92.80% 62.00% 89.80% 92.60% 92.20% 25 83.41% 83.89% 88.94% 86.78% 96.64% 88.94% 90.39% 98.80% 95.43% 97.12% 61.78% 99.28% 97.60% 97.60% 26 82.84% 83.07% 88.94% 86.01% 96.16% 88.94% 90.75% 99.55% 95.03% 96.84% 67.49% 100.00% 96.84% 97.97% 27 82.20% 82.00% 88.80% 86.80% 95.60% 88.80% 91.20% 99.00% 94.60% 96.60% 63.40% 99.00% 97.00% 97.80% 28 83.37% 83.83% 89.15% 87.07% 96.54% 89.15% 90.99% 99.08% 95.61% 97.00% 65.13% 99.54% 97.23% 97.92% 29 84.67% 84.44% 89.33% 87.56% 95.56% 89.33% 91.56% 99.11% 94.89% 96.22% 66.00% 99.33% 97.56% 98.22% 30 81.80% 82.20% 88.60% 85.60% 95.40% 88.60% 91.00% 99.40% 94.80% 96.20% 65.40% 99.80% 96.80% 97.60% 31 81.33% 81.63% 93.07% 85.84% 91.87% 83.13% 85.84% 98.49% 89.76% 93.07% 68.37% 98.80% 93.68% 95.48% 32 85.21% 86.03% 96.16% 91.78% 95.34% 89.86% 91.78% 99.45% 94.25% 96.16% 67.40% 99.18% 96.71% 97.53% 33 81.26% 81.90% 91.37% 87.16% 89.68% 84.63% 87.16% 88.63% 89.05% 90.53% 69.26% 85.90% 91.58% 91.58% 34 81.09% 81.41% 95.19% 88.78% 94.23% 85.90% 88.78% 98.72% 91.35% 95.19% 65.71% 99.68% 97.12% 97.76% 35 82.82% 82.82% 93.85% 88.97% 92.82% 85.90% 88.97% 97.44% 91.80% 93.85% 64.36% 97.95% 94.87% 94.87% 36 81.30% 81.93% 93.28% 88.03% 92.65% 85.29% 87.82% 90.97% 90.55% 92.86% 66.18% 88.66% 93.07% 92.86% 37 79.78% 80.00% 87.39% 81.96% 93.48% 87.39% 90.65% 99.35% 91.96% 95.65% 65.00% 99.57% 95.44% 98.04% 38 84.27% 84.91% 91.16% 86.42% 95.04% 91.16% 92.67% 99.35% 94.40% 97.41% 68.32% 99.35% 97.20% 98.28% 39 82.00% 82.40% 90.00% 83.60% 94.00% 90.00% 91.80% 99.40% 93.40% 96.00% 62.80% 100.00% 95.80% 98.00% 40 80.86% 81.07% 88.48% 83.95% 92.80% 88.48% 91.15% 99.79% 92.39% 95.27% 63.37% 100.00% 94.86% 97.12% 41 82.16% 82.37% 88.38% 83.57% 95.79% 88.38% 93.59% 99.80% 95.59% 98.20% 66.53% 99.80% 97.60% 99.20% 42 80.20% 80.60% 87.00% 82.40% 93.40% 87.00% 89.80% 98.60% 92.20% 96.20% 62.00% 99.00% 95.80% 98.00% 43 78.86% 79.71% 94.86% 85.14% 94.00% 85.14% 90.00% 99.14% 91.71% 96.86% 62.00% 99.43% 95.71% 98.00% 44 78.80% 80.30% 92.77% 83.29% 92.02% 83.29% 88.28% 98.75% 89.78% 96.01% 61.35% 98.75% 94.51% 97.76% 45 80.63% 81.25% 91.04% 84.58% 89.38% 84.58% 86.88% 90.63% 87.50% 90.83% 66.25% 88.75% 90.83% 91.25% 46 84.45% 85.41% 95.46% 87.32% 93.78% 87.32% 91.39% 98.33% 92.82% 96.17% 69.62% 99.04% 95.46% 96.89% 47 84.67% 85.58% 95.65% 87.19% 94.74% 87.19% 92.45% 99.54% 93.59% 97.94% 68.88% 99.54% 96.80% 99.09% 48 79.84% 79.84% 91.13% 83.27% 89.92% 83.07% 88.31% 90.12% 88.91% 90.32% 64.92% 86.29% 90.73% 90.32%
TABLE 6 Correct Model Identification Rates Using Different N*s Paired With Each Information Criterion for the RandomX Generating Model, by Condition
Version of AICC Version of HQIC Version of BIC Version of CAIC Condition AIC AICC AICC AICC HQIC HQIC HQIC BIC BIC BIC CAIC1 CAIC CAIC CAIC 1 80.16% 80.44% 90.22% 83.15% 94.02% 89.40% 92.66% 98.64% 94.02% 96.47% 60.87% 98.64% 96.20% 97.28% 2 83.15% 83.42% 90.49% 85.05% 95.65% 89.95% 92.66% 98.91% 95.65% 97.83% 65.49% 99.46% 97.01% 98.64% 3 81.90% 82.32% 88.84% 83.79% 94.11% 88.42% 90.95% 98.74% 94.11% 97.47% 65.47% 98.53% 96.84% 98.95% 4 81.04% 81.82% 90.91% 83.64% 93.25% 90.13% 92.21% 99.48% 93.25% 96.62% 62.86% 99.74% 96.62% 98.44% 5 82.34% 82.88% 91.30% 85.33% 94.84% 90.22% 92.39% 99.19% 94.84% 97.28% 69.02% 99.73% 97.01% 98.37% 6 80.58% 80.58% 88.22% 83.47% 92.56% 87.60% 90.50% 97.73% 92.56% 95.87% 62.19% 98.76% 95.04% 97.52% 7 79.26% 79.75% 95.80% 83.46% 93.33% 82.96% 89.38% 99.26% 90.12% 95.31% 62.96% 99.26% 94.57% 97.78% 8 83.61% 83.85% 97.15% 87.65% 95.72% 86.94% 92.40% 99.29% 92.87% 97.15% 65.80% 99.53% 96.44% 98.34% 9 83.23% 83.85% 91.62% 87.73% 91.82% 87.32% 90.80% 86.30% 91.41% 90.59% 64.62% 83.44% 91.21% 91.00% 10 81.31% 81.31% 97.09% 87.62% 94.90% 85.19% 90.05% 99.52% 91.02% 97.09% 64.81% 99.76% 95.63% 98.06% 11 82.18% 82.87% 96.30% 87.50% 94.21% 86.11% 91.44% 98.38% 91.90% 95.37% 63.66% 99.07% 94.91% 98.15% 12 80.33% 80.33% 92.62% 85.25% 91.39% 84.84% 88.93% 88.73% 89.75% 92.01% 64.75% 86.48% 91.80% 91.60% 13 83.42% 83.95% 91.84% 85.00% 95.53% 90.79% 93.16% 98.42% 95.53% 96.84% 70.26% 99.21% 96.32% 98.42% 14 85.86% 85.60% 92.15% 87.70% 95.81% 90.84% 93.72% 98.43% 95.81% 97.64% 70.42% 99.22% 96.86% 98.43% 15 84.02% 84.43% 90.16% 86.07% 94.88% 89.75% 92.62% 98.57% 94.88% 96.93% 65.37% 98.57% 96.72% 97.95% 16 82.03% 82.27% 91.25% 83.45% 95.27% 89.36% 93.85% 98.82% 95.27% 97.64% 66.67% 99.53% 97.40% 98.58% 17 81.72% 81.94% 88.71% 82.84% 93.91% 88.26% 91.20% 99.32% 93.91% 97.29% 65.69% 99.32% 96.39% 98.87% 18 80.12% 80.12% 86.82% 81.74% 92.09% 85.80% 89.45% 98.78% 92.09% 96.76% 62.48% 99.59% 95.54% 97.97% 19 84.17% 83.95% 98.39% 90.83% 96.79% 90.60% 93.58% 99.77% 94.50% 97.94% 65.60% 99.77% 96.79% 99.31% 20 78.43% 79.09% 96.73% 85.62% 92.38% 84.10% 88.67% 99.13% 89.98% 95.21% 62.09% 99.35% 94.12% 97.39% 21 81.15% 81.35% 91.80% 85.25% 90.37% 85.04% 87.30% 88.32% 88.32% 90.78% 64.75% 84.63% 90.57% 91.19% 22 82.52% 82.94% 98.08% 87.63% 94.03% 86.99% 91.05% 99.15% 91.90% 95.95% 62.90% 99.36% 95.31% 98.51% 23 79.96% 79.76% 96.96% 84.21% 93.73% 83.40% 88.87% 98.99% 90.89% 96.36% 61.74% 98.99% 94.94% 97.37% 24 80.81% 81.41% 93.33% 86.47% 91.52% 85.46% 88.28% 90.10% 89.50% 92.53% 66.87% 88.28% 91.92% 92.32% 25 83.02% 83.49% 90.57% 84.43% 95.52% 88.92% 92.45% 99.29% 94.58% 97.64% 64.39% 99.76% 97.41% 98.59% 26 84.51% 84.97% 91.34% 86.56% 96.58% 90.21% 92.94% 99.54% 96.36% 98.63% 66.97% 99.77% 98.41% 99.54% 27 82.20% 81.80% 88.40% 83.00% 93.40% 86.60% 90.20% 98.40% 92.20% 95.60% 63.80% 99.20% 95.20% 97.00% 28 80.18% 80.18% 88.16% 81.78% 94.31% 87.02% 91.12% 99.77% 92.94% 97.04% 63.55% 99.77% 96.13% 98.86% 29 82.30% 82.74% 91.37% 84.74% 96.24% 90.27% 92.26% 99.56% 96.02% 98.23% 63.50% 99.56% 98.01% 99.34% 30 81.80% 82.00% 89.00% 84.20% 93.40% 88.40% 90.80% 97.60% 93.00% 95.60% 64.60% 98.20% 94.80% 97.00% 31 80.51% 81.59% 96.39% 86.64% 92.78% 85.56% 89.89% 97.83% 91.34% 96.03% 61.73% 98.56% 96.03% 97.47% 32 80.94% 81.56% 95.63% 86.88% 93.13% 86.25% 89.69% 97.19% 90.94% 95.31% 61.56% 97.81% 94.38% 96.88% 33 84.03% 84.52% 94.10% 88.21% 92.14% 87.72% 89.44% 87.72% 90.42% 93.37% 64.13% 85.75% 93.61% 90.66% 34 83.76% 84.12% 96.75% 87.37% 94.22% 86.64% 91.34% 98.56% 91.70% 96.03% 66.79% 99.64% 95.67% 97.47% 35 79.42% 79.74% 95.50% 85.21% 93.25% 84.57% 91.32% 98.39% 92.28% 95.18% 64.95% 99.36% 94.53% 96.79% 36 79.52% 80.00% 92.29% 85.30% 89.88% 84.58% 86.75% 89.16% 89.16% 90.60% 63.61% 85.78% 90.36% 90.12% 37 84.20% 84.85% 91.99% 85.93% 96.32% 91.56% 94.37% 99.35% 96.32% 97.62% 66.67% 99.57% 97.40% 98.05% 38 81.43% 81.64% 92.44% 84.67% 97.62% 90.07% 94.17% 99.78% 96.76% 98.27% 63.50% 99.78% 98.06% 99.35% 39 82.40% 82.60% 89.40% 84.40% 95.20% 88.00% 91.80% 99.20% 95.00% 97.20% 66.60% 99.80% 97.00% 98.20% 40 83.64% 83.85% 91.82% 85.48% 95.91% 89.98% 93.87% 99.39% 95.30% 97.14% 65.85% 99.80% 96.93% 98.16% 41 84.43% 84.02% 91.39% 85.04% 95.90% 91.19% 93.44% 99.18% 95.29% 98.16% 68.03% 99.18% 98.16% 98.77% 42 81.00% 81.20% 91.40% 83.80% 95.00% 89.80% 92.40% 99.20% 94.60% 97.00% 63.20% 99.20% 96.80% 98.40% 43 79.31% 80.35% 95.52% 83.79% 92.41% 82.76% 87.93% 96.90% 88.62% 94.48% 60.00% 98.62% 94.14% 96.55% 44 83.33% 83.63% 96.49% 87.14% 94.15% 86.84% 89.77% 98.54% 91.23% 96.20% 66.96% 99.12% 96.20% 97.95% 45 80.63% 81.11% 93.71% 86.44% 92.98% 84.99% 90.07% 91.28% 90.80% 93.46% 60.53% 87.89% 92.98% 92.98% 46 83.29% 84.11% 95.07% 86.85% 92.33% 85.75% 90.41% 98.08% 90.69% 94.80% 64.66% 98.36% 94.52% 97.26% 47 81.96% 82.47% 96.39% 85.83% 93.30% 85.31% 88.92% 97.68% 90.72% 95.36% 68.81% 98.97% 94.59% 97.42% 48 82.42% 83.56% 93.84% 87.44% 92.47% 87.22% 90.41% 93.15% 91.78% 92.92% 70.09% 90.41% 92.92% 93.61%
The default CAIC value (here, CAIC
The performance of the AIC and AICC
Correct model identification rates for the modified versions of the AICC, HQIC, BIC and CAIC were also assessed. Of the three modifications to the AICC, the AICC
There also seemed to be a slight effect of the pattern and values of the intercept ρ
TABLE 7 Mean Correct Model Identification Rates for FixedX Data Summarized by Number of Cross-Classification Units and Intraunit Correlation Coefficient Generating Values, by Condition
Conditions Version of AICC Version of HQIC Version of BIC Version of CAIC ρIUCC AIC AICC AICC AICC HQIC HQIC HQIC BIC BIC BIC CAIC1 CAIC CAIC CAIC 50 0.15,0.15 81.67% 82.02% 88.60% 84.72% 94.57% 88.60% 90.98% 98.90% 94.06% 96.27% 64.61% 99.34% 96.42% 97.63% 0.15,0.30 82.92% 83.13% 89.48% 85.78% 95.31% 89.48% 92.15% 99.30% 95.01% 97.02% 65.94% 99.48% 97.04% 98.24% 0.30,0.15 81.39% 81.72% 88.66% 84.48% 94.23% 88.66% 90.85% 99.00% 93.83% 96.44% 64.62% 99.35% 96.62% 97.94% 25 0.15,0.15 81.61% 82.10% 94.25% 86.69% 93.66% 85.25% 89.24% 98.62% 91.48% 95.00% 65.52% 99.19% 95.15% 97.12% 0.15,0.30 82.88% 83.41% 94.53% 87.96% 94.09% 86.49% 90.07% 99.03% 92.32% 95.71% 66.23% 99.19% 95.38% 97.55% 0.30,0.15 80.41% 80.80% 92.07% 85.42% 91.02% 83.88% 87.71% 90.18% 89.24% 91.68% 65.27% 87.81% 91.72% 92.04%
The modifications of the value used for N* in the HQIC formula (see Equation 6) worked substantially better than the default value of one used in SAS PROC MIXED (when used to estimate CCREMs) for which the HQIC
As mentioned earlier, the default BIC
The performance of four versions of the CAIC is also presented in Tables 5 and 6. The default CAIC
Fit of the correct versus underparameterized (power) and correct versus overparameterized models (Type I error) was compared using the χ
TABLE 8 Percentage of Converged Solutions in Which a Type I Error Occurred When Selecting the Correctly Specified Model Over the Overparameterized Model Using the Deviance Statistic for the FixedX and RandomX Generating Models, by Condition
Condition Generating model FixedX RandomX 1 6.04% 6.25% 2 5.62% 4.35% 3 5.48% 6.32% 4 6.38% 7.01% 5 4.69% 5.98% 6 7.46% 7.44% 7 7.69% 7.41% 8 5.25% 4.99% 9 5.41% 5.11% 10 7.00% 5.83% 11 5.77% 5.79% 12 8.45% 5.74% 13 4.91% 4.74% 14 6.50% 4.19% 15 6.01% 5.53% 16 6.03% 4.73% 17 4.82% 6.32% 18 6.80% 8.11% 19 5.83% 3.21% 20 5.81% 8.28% 21 6.60% 7.17% 22 5.25% 6.40% 23 6.88% 6.48% 24 6.40% 5.45% 25 4.81% 5.66% 26 5.64% 4.78% 27 5.80% 8.00% 28 4.85% 7.06% 29 5.11% 4.20% 30 5.20% 7.40% 31 7.83% 5.78% 32 3.84% 6.56% 33 6.74% 6.39% 34 5.13% 5.78% 35 6.67% 5.79% 36 5.67% 5.30% 37 8.26% 3.90% 38 5.82% 3.67% 39 7.00% 5.80% 40 7.61% 4.91% 41 4.61% 4.71% 42 7.80% 5.40% 43 5.71% 6.90% 44 7.23% 5.26% 45 7.71% 5.33% 46 4.78% 6.03% 47 4.35% 6.44% 48 6.45% 6.39%
The χ
The present study was designed to extend the work of several researchers who have assessed the performance of information criteria in terms of their correct multilevel model identification rates (Gurka, [
Estimation of random effects' variances in multilevel models is generally quite challenging under many conditions. Estimation of CCREMs' random effects' variances is no exception and thus it was not surprising that in scenarios with larger true intercept ρ
When interpreting this result, it should be remembered that the correct (generating) model included a MS predictor that was omitted when estimating the underparameterized model. Omission of this predictor increases the MS variability which seemingly made the remaining MS and HS ρ
Another unexpected pattern was identified in the convergence rates (see Table 4). The number of cross-classifications per MS (two versus four) had a stronger effect than did the degree of cross-classification (balanced versus unbalanced). Under the balanced conditions, the average per-cross-classification sample size is larger than in the unbalanced conditions. Thus, for example, in the 50%:50% conditions, the per-cross-classification cell size (equaling 12.5 for the = 25 conditions) is considerably larger than in the 64%:12%:12%:12% conditions (equaling 6.5 for the = 25 conditions). Yet, convergence rates were better for the unbalanced, four cross-classifications per MS conditions than for the balanced, two cross-classifications per MS. This means that the degree of fullness of the cross-classification table (compare Tables 2 versus 3) seems to improve convergence rates more than the average cross-classification cell's size. Future research should investigate additional patterns of sparseness and sample sizes to test this finding further.
The primary finding of this study was that the default values of the information criteria used by SAS PROC MIXED for best model fit should not be used. In particular, the default HQIC and BIC values (notated here as HQIC
Use of the default CAIC value (the CAIC
Correct model identification rates for the remaining two default information criteria (the AICC
Results from Whittaker and Furlow's (2009) study of the performance of these information criteria in assessing conventional multilevel models fit matched those found here. They also found that the pairing of N with the BIC and CAIC worked better than use of m and that these two consistent information criteria worked better than the efficient HQIC, AICC and AIC. In addition, similar to Whittaker and Furlow's results, differences found between use of N and m were not very substantial. The same result was found in the present study with pairings of the relevant information criteria with the number of nonempty cross-classified cells, c, working reasonably well and not substantially differently from use of m or N as the N*. Gurka's (2006) results, on the other hand, had supported the use of m over N for conventional multilevel models. It is known that m plays a more important part than N in power for multilevel designs (e.g., Raudenbush & Liu, [
Two primary factors were found to influence functioning of the modified information criteria including the number of units per classification factor, m, and the value and pattern of the intercept ρ
Performance of the χ
Estimation of CCREMs works better in scenarios where more variability is attributable to the classification factors and in scenarios where there are more cross-classifications. Note, however, that the present study only assessed performance of FIML estimation. Future research could explore this pattern of results when REML estimation is used to estimate the CCREMs. More importantly, use of MCMC estimation was not assessed in the present study and should be assessed especially for sparse cross-classification conditions. In addition, performance of the deviance information criterion (Spiegelhalter, Best, Carlin, & van der Linde, 2002) used with MCMC estimation should also be assessed.
The present study only looked at a very small subset of particularly simple CCREMs. This was designed to provide a starting point for the assessment of information criteria's functioning for CCREM model selection. Although use of the information criteria are more typically used with nonnested models and models that differ by more than a single parameter, the under- and overparameterized models that were compared in the present study differed from the correct model by only a single fixed effects parameter. Future research should explore performance of the information criteria in even more authentic scenarios entailing more complex patterns in which incorrect models entail combine mixed zero and nonzero true fixed and random effects. Additional scenarios should explore the use of these information criteria for differentiating between the fit of nonnested models.
Use of information criteria for identifying the better fitting model permits simultaneous assessment of the impact on fit of a set of parameters being added to (or removed from) a model. Despite the perceived lack of consensus about the validity of statistical significance testing associated with multilevel model parameter estimates, applied researchers tend to use the statistical significance results for deciding which parameters to keep in a model. Future research could explore the correspondence between inferences associated with specific parameters estimated in a model with inferences that would be made based on a comparison of models' information criteria.
The present study also only explored information criteria functioning with two-level CCREMs with only two cross-classifications and, as mentioned, entailed comparisons of relatively simple models. However, the results seem to provide a useful foundation for future research on this topic. Future research can look at more complex patterns of data structure, models and differences among the models. Once further extensions to the present study have been accomplished, stronger recommendations about which specific values for N* should be used with each of these information criteria. In the meantime, the results of this study can support the recommendation that researchers not use the default information criteria reported when SAS PROC MIXED is used to estimate CCREMs. Instead, researchers should use the N*-modified formulations of the information criteria when choosing amongst CCREM models.
Last, it should be emphasized that although the example used here entailed an educational context, there are, however, many examples of cross-classified data structures that are encountered in other fields. For example, a medical research example entails the cross-classification of patients by nurses and doctors (Rasbash & Browne, [
A previous version of this article was presented at the 2010 annual meeting of the American Educational Research Association in Denver, Colorado.
By S.Natasha Beretvas and DanielL. Murphy
Reported by Author; Author
S. Natasha (Tasha) Beretvas is a professor of Quantitative Methods at the University of Texas at Austin and a faculty associate of the Population Research Center and the Meadows Center for Preventing Educational Risk. Her research focuses on evaluation of statistical models in educational and social science research with a focus on extensions to the conventional multilevel model to handle sources of data structure complexities.
Daniel L. Murphy currently serves as a Research Scientist in the Research & Innovation Network at Pearson, where his research program includes the use of growth measures, adaptive testing, and data visualization techniques to inform instructional decisions and interventions.