Dummy variables for linear models with multiple levels

1790

收藏 2014-04-15

I'm currently working with data which has continuous variables and a hierarchical structure attached to it, think of measuring blood pressure, size and weight of different domestic animals (cats, dogs, birds) as well as of their species, family and order.

All data is measured on the level of the individuals, so there are no predictors on higher levels (although they could be generated by taking, e.g. the inter-level mean).
Let's say I want to predict the blood pressure (y) with the help of the weight (x1) and the size (x2).
Ignoring the hierarchical information, I could use a linear model y=β0+x1β1+x2β2, which might be a very bad idea.

What might be the right approach for dummy variables if there are more than two categories?

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

ReneeBK

2014-4-15 05:13:06

Your first question: Close but not quite: you either need to leave one of your species out of the model as a reference category or you need to leave out the constant. In the former case each βi+2 measures the difference between species i and the reference species, so the parameter of the reference species is necessarily 0, which means that the indicator variable for the reference species drops out of the model. In the latter case each βi+2 is the constant for each species, which means that there is nothing left to do for the overall constant β0 so it should drop out.

Your second question: No, all the information is already captured by the species indicator variables (a term I prefer over dummy variable), so there is nothing for the family indicator variables left to explain and they will be automatically dropped from the model due to perfect multicolinearity.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-15 05:13:34

So species indicator variables can be used to reflect a 2-level structure, but they do not reflect the additional structure of a 3- or 4-level model - am I reading this correctly? – Roland

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-15 05:13:57

I don't understand that comment. I see only two levels: species and families. Where is that 3rd and 4th level comming from? As long as the levels are hierarchical, than indicator variables at the lowest level (e.g. species) will absorbe all the variance of all the higher levels (e.g. families). – Maarten Buis

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-15 05:14:29

Oh, I might be using terminology incorrectly. With 2 levels I mean the individual pet level (level 1), and the species level (level 2): I'd call the simple linear model without species indicators a 1-level model, and as soon as we have species information, we would have a 2-level model. The third level would be one which includes families as well. A fourth level (order) was not written down in the models, but mentioned in the introduction. – Roland

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2014-4-15 05:14:50

So family and order indicator variables will add nothing once you have included the species indicator variable. As a consequence, these will result in perfect colinearity and they will be dropped. – Maarten Buis

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群