Many simultaneous linear models
Finally, in the next courses we will encounter an approach to improving the inference in linear models when Y is not a single column vector M×1, but we have thousands of such models to fit, M×N. Typically such data is presented in a tall format, N×M, where there are M samples for each linear model (often in the range 6−100), and N linear models to fit, for example a linear model for each of thousands of genes. The improved statistical inference comes from modeling some of the terms, such as the variance, as a hierarchical model, essentially sharing information across the N linear models. This will be covered in more detail in PH525.3x. Two of the first references for this approach are Lönnstedt and Speed, Replicated microarray data (2002), and Smyth, Linear models and empirical bayes methods for assessing differential expression in microarray experiments (2004). The methods in the Smyth paper are employed in the limma package on Bioconductor. The User Guide of the limma package contains extensive documentation and examples for analysis of genomics data.