Snijders & Bosker (2012) have a very nice treatment of their suggested model building principles. See the section "6.4 Model specification" starting on page 102.
- Considerations relating to the subject matter. These follow from field knowledge, existing theory, detailed problem formulation, and common sense.
- The distinction between effects that are indicated a priori as effects to be tested, that is, effects on which the research is focused, and effects that are necessary to obtaina good model fit. Often the effects tested are a subset of the fixed effects, and the random part is to be fitted adequately but of secondary interest. When there is no strong prior knowledge about which variables to include in the random part, one may follow a data-driven approach to select the variables for the random part.
- A preference for 'hierarchical' models in the general sense (not the 'hierarchical linear model' sense) that if a model contains an interaction effect, then the corresponding main effects should usually also be included (even if these are not significant); and if a variable has a random slope, its fixed effect should normally also be included in the model. The reason is that omitting such effects may lead to erroneous interpretations.
- Doing justice to the multilevel nature of the problem.
- Awareness of the necessity of including certain covariances of random effects. Including such covariances means that they are free parameters in the model, not constrained to 0 but estimated from the data. In Section 5 . 1 .2, attention was given to the necessity to include in the model all covariances TOh between random slopes and random intercept. Another case in point arises when a categorical variable with c ::: 3 categories has a random effect. This is implemented by giving random slopes to the c - 1 dummy variables that are used to represent the categorical variable. The covariances between these random slopes should then also be included in the model. Formulated generally, suppose that two variables Xh and Xh, have random effects, and that the meaning of these variables is such that they could be replaced by two linear combinations, aXh + a'Xh' and bXh + b'Xh, . (For the random intercept and random slope discussed in Section 5 . 1 .2, the relevant type of linear combination would correspond to a change of origin of the variable with the random slope.) Then the covariance Thh' between the two random slopes should be included in the model.
- Reluctance to include nonsignificant effects in the model - one could also say, a reluctance to overfit. Each of points 1-5 above, however, could override this reluctance. An obvious example of this overriding is the case where one wishes to test for the effect of X2, while controlling for the effect of Xl . The purpose of the analysis is a subject-matter consideration, and even if the effect of Xl is nonsignificant, one still should include this effect in the model.
- The desire to obtain a good fit, and include all effects in the model that contribute to a good fit. In practice, this leads to the inclusion of all significant effects unless the data set is so large that certain effects, although significant, are deemed unimportant nevertheless.
- Awareness of the following two basic statistical facts:(a) Every test of a given statistical parameter controls for all other effects in the model used as a null hypothesis (Mo in Section 6.2). Since the latter set of effects has an influence on the interpretation as well as on the statistical power, test results may depend on the set of other effects included in the model.(b) We are constantly making type I and type II errors. Especially the latter, since statistical power often is rather low. This implies that an effect being nonsignificant does not mean that the effect is absent in the population. It also implies that a significant effect may be so by chance (but the probability of this is no larger than the level of significance - most often set at 0.05). Multilevel research often is based on data with a limited number of groups. Since power for detecting effects of level-two variables depends strongly on the number of groups in the data, warnings about low power are especially important for level-two variables.
- Providing tested fixed effects with an appropriate error term in the model (whether or not it is significant). For level-two variables, this is the random intercept term (the residual term in (5.7)). For cross-level interactions, it is the random slope of the level-one variable involved in the interaction (the residual term in (5.8)). For level-one variables, it is the regular level-one residual that one would not dream of omitting. This guideline is supported by Berkhof and Kampen (2004).