全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 LATEX论坛
3322 4
2016-10-07
PARTIAL REGRESSION PLOTName:

    PARTIAL REGRESSION PLOT
Type:

    Graphics Command
Purpose:

    Generate a partial regression plot. Note that partial regression plots are also referred to as added variable plots, adjusted variable plots, and individual coefficient plots.
Description:

    When performing a linear regression with a single independent variable, a scatter plot of the response variable against the independent variable provides a good indication of the nature of the relationship. If there is more than one independent variable, things become more complicated. Although it can still be useful to generate scatter plots of the response variable against each of the independent variables, this does not take into account the effect of the other independent variables in the model.Partial regression plots attempt to show the effect of adding an additional variable to the model (given that one or more indpendent variables are already in the model). Partial regression plots are formed by:

    • Compute the residuals of regressing the response variable against the indpendent variables but omitting Xi
    • Compute the residuals from regressing Xiagainst the remaining indpendent variables.
    • Plot the residuals from (1) against the residuals from (2).

Velleman and Welsch (see References below) express this mathematically as:



    Y.[i] versus Xi.[i]

where



    Y.[i] = residuals from regressing Y (the response variable) against all the indpendent variables exceptXi
    Xi.[i] = residuals from regressing Xi against the remaining indpependent variables.

Velleman and Welsch list the following useful properties for this plot:


  • The least squares linear fit to this plot has the slopei and intercept zero.
  • The residuals from the least squares linear fit to this plot are identical to the residuals from the least squares fit of the original model (Y against all the independent variables including Xi).
  • The influences of individual data values on the estimation of a coefficient are easy to see in this plot.
  • It is easy to see many kinds of failures of the model or violations of the underlying assumptions (nonlinearity, heteroscedasticity, unusual patterns).

Partial regression plots are widely discussed in the regression diagnostics literature (e.g., see the References section below). Since the strengths and weaknesses of partial regression plots are widely discussed in the literature, we will not discuss that in any detail here.

Partial regression plots are related to, but distinct from, partial residual plots. Partial regression plots are most commonly used to identify leverage points and influential data points that might not be leverage points. Partial residual plots are most commonly used to identify the nature of the relationship between Y and Xi(given the effect of the other indpendent variables in the model). Note that since the simple correlation betweeen the two sets of residuals plotted is equal to the partial correlation between the response variable and Xi partial regression plots will show the correct strength of the linear relationship between the response variable and XiThis is not true for partial residual plots. On the other hand, for the partial regression plot, the x axis is not Xi. This limits its usefulness in determining the need for a transformation (which is the primary purpose of the partial residual plot).

Dataplot provides two forms for the partial regression plot. You can generate either a single partial regression plot or you can generate a matrix of partial regression plots (one plot for each independent variable in the model).

For the matrix form of the command, a number of SET FACTOR PLOT options can be used to control the appearance of the plot (not all of the SET FACTOR PLOT options apply). These are discussed in the Notes section below. Syntax 1:


    PARTIAL REGRESSION PLOT <y> <x1> ... <xk> <xi>
                                <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                  <x1> ... <xk> are the independent variables;
                  <xi> is the independent variable for which the partial regression plot is being generated
                                (note that <xi> must be one of the variables listed in <x1> ... <xk>;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.This is the syntax for generating a single partial regression plot.
Syntax 2:

    MATRIX PARTIAL REGRESSION PLOT <y> <x1> ... <xk>
                                <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                  <x1> ... <xk> are the independent variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.This syntax is used to generate a matrix of partial regression plots.
Examples:

    PARTIAL REGRESSION PLOT Y X1 X2 X3 X4 X2
    MATRIX PARTIAL REGRESSION PLOT Y X1 X2 X3 X4

    PARTIAL REGRESSION PLOT Y X1 X2 X3 X4 X2 SUBSET TAG > 2
    MATRIX PARTIAL REGRESSION PLOT Y X1 X2 X3 X4 SUBSET TAG > 2

Note:

    The following option controls which axis tic marks, tic mark labels, and axis labels are plotted.

      SET FACTOR PLOT LABELS <ON/OFF/XON/YON/BOX>
    OFF means that all axis labels are suppressed (this can be useful if a large number of variables are being plotted). ON means that both X and Y axis labels are printed. XON only plots the x axis labels and YON only plots the y axis labels.
    BOX is a special option that creates an extra column on the left and an extra row on the bottom. The axis label is printed in this box. BOX is typically reserved for the plot types that plot the variable names in the axes labels.
    The default is ON (both x and y axis labels are printed).
Note:

    The following option controls where the x axis tic marks, tic mark labels, and axis label are printed.

      SET FACTOR PLOT X AXIS <BOTTOM/TOP/ALTERNATE>
    BOTTOM specifies that the x axis labels are printed on the bottom axis (on the last row only). TOP specifies that the x axis labels are printed on the top axis (first row only). ALTERNATE specifies that the x axis labels alternate between the top (first row) and bottom axis (last row). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.
    The default is ALTERNATE.
Note:

    The following option controls where the y axis tic marks, tic mark labels, and axis label are printed.

      SET FACTOR PLOT Y AXIS <LEFT/RIGHT/ALTERNATE>
    LEFT specifies that the y axis labels are printed on the left axis (on the first column only). RIGHT specifies that the y axis labels are printed on the right axis (last column only). ALTERNATE specifies that the y axis labels alternate between the left (first column) and right axis (last column). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.
    The default is ALTERNATE.
Note:

    Users have different preferences in terms of whether the plot frames for neighboring plots are connected or not. This is controlled with the following option.

      SET FACTOR PLOT FRAME <DEFAULT/CONNECTED/USER>
    DEFAULT connects neighboring frames (i.e., the FRAME CORNER COORDINATES are set to 0 0 100 100). USER uses whatever frame coordinates are currently set (15 20 85 90 by default) and makes no special provisions for axis labels and tic marks (i.e., you set them as you normally would, each plot uses whatever you have set). CONNECTED uses whatever frame coordinates have been set by the user, but it draws the axis labels and tic marks as if DEFAULT were being used (that is, as determined by the SET FACTOR PLOT commands described above). Typically, CONNECTED is used to put a small bit of space between plots. For example, you might use FRAME CORNER COORDINATES 3 3 97 97 before the PARTIAL RESIDUAL PLOT command.
    Since the plots can often have different limits for the axes, the default is USER.
Note:

    When the tic marks and tic mark labels are all plotted on the same side (i.e., SET FACTOR PLOT Y AXIS is set to LEFT or RIGHT or SET PARTIAL RESIDUAL PLOT X AXIS is set to BOTTOM or TOP), then overlap between plots is possible. The TIC OFFSET command can be used to avoid this. In addition, you can stagger the tic labels with the following command:

      SET FACTOR PLOT LABEL DISPLACEMENT <NORMAL/STAGGERED/VALUE>
    NORMAL means that all tic labels are plotted at a distance determined by the TIC LABEL DISPLACEMENT command. STAGGERED means that alternating plots will be staggered. That is, one will use the standard displacement while the next uses a staggered value. Entering this command with a numeric value specifies the amount of the displacement for the staggered tic labels. For example,


      TIC MARK LABEL DISPLACEMENT 10
      SET FACTOR PLOT LABEL DISPLACEMENT STAGGERED
      SET FACTOR PLOT LABEL DISPLACEMENT 25
    These commands specify that the default tic label displacement is 10 and the staggered tic mark label displacement is 25.
Note:

    It is often helpful on scatter plot matrices to overlay a fitted line on the plots. The following command is used to specify the type of fit.

      SET FACTOR PLOT FIT <NONE/LOWESS/LINE/QUAD/SMOOTH>
    NONE means that no fitted line is plotted. LOWESS means that a locally weighted least squares line will be overlaid. LINE means that a linear fit (Y = A0 + A1*X) will be overlaid. QUAD means that a quadratic fit (Y = A0 + A1*X + A2*X**2) will be overlaid. SMOOTH means that a least squares smoothing will be overlaid.
    For LOWESS, it is recommended that the lowess fraction be set fairly high (e.g., LOWESS FRACTION 0.6).
    The fitted line is currently only generated if the factor plot type is PLOT.
    The default is for no fitted line to be overlaid on the plot. If a overlaid fit is desired, the most common choice is to use LOWESS.
Note:

    Dataplot allows you to set axis limits with the LIMITS command. For the factor plot, it is often desirable to set the axis limits for each plot. This can be done with the command

      SET FACTOR PLOT YLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...
      SET FACTOR PLOT XLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...
    The default is to allow the axis limits to float with the data.
Note:

    You can use standard plot control commands to control the appearance of the factor plot.For example,


      MULTIPLOT CORNER COORDINATES 5 5 95 95
      MULTIPLOT SCALE FACTOR 3
      TIC OFFSET UNITS SCREEN
      TIC OFFSET 5 5
Default:

    None
Synonyms:

    None
Related Commands:

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2016-10-7 09:45:14
For illustration I will take a less complex regression model Y=β1+β2X2+β3X3+ϵY=β1+β2X2+β3X3+ϵ where the predictor variables X1X1 and X2X2 may be correlated. Let's say the slopes β2β2 and β3β3 are both positive so we can say that (i) YY increases as X2X2 increases, if X3X3 is held constant, since β2β2 is positive; (ii) YY increases as X3X3 increases, if X2X2 is held constant, since β3β3 is positive.

Note that it's important to interpret multiple regression coefficients by considering what happens when the other variables are held constant ("ceteris paribus"). Suppose I just regressed YY against X2X2 with a model Y=β′1+β′2X2+ϵ′Y=β1′+β2′X2+ϵ′. My estimate for the slope coefficient β′2β2′, which measures the effect on YY of a one unit increase in X2X2 without holding X3X3 constant, may be different from my estimate of β2β2 from the multiple regression - that also measures the effect on YY of a one unit increase in X2X2, but it does hold X3X3 constant. The problem with my estimate β′2^β2′^ is that it suffers from omitted-variable bias if X2X2 and X3X3 are correlated.

To understand why, imagine X2X2 and X3X3 are negatively correlated. Now when I increase X2X2 by one unit, I know the mean value of YY should increase since β2>0β2>0. But as X2X2 increases, if we don't hold X3X3 constant then X3X3 tends to decrease, and since β3>0β3>0 this will tend to reduce the mean value of YY. So the overall effect of a one unit increase in X2X2 will appear lower if I allow X3X3 to vary also, hence β′2<β2β2′<β2. Things get worse the more strongly X2X2 and X3X3 are correlated, and the larger the effect of X3X3 through β3β3 - in a really severe case we may even find β′2<0β2′<0 even though we know that, ceteris paribus, X2X2 has a positive influence on YY!

Hopefully you can now see why drawing a graph of YY against X2X2 would be a poor way to visualise the relationship between YY and X2X2 in your model. In my example, your eye would be drawn to a line of best fit with slope β′2^β2′^ that doesn't reflect the β2^β2^ from your regression model. In the worst case, your model may predict that YY increases as X2X2 increases (with other variables held constant) and yet the points on the graph suggest YY decreases as X2X2 increases.

The problem is that in the simple graph of YY against X2X2, the other variables aren't held constant. This is the crucial insight into the benefit of an added variable plot (also called a partial regression plot) - it uses the Frisch-Waugh-Lovell theorem to "partial out" the effect of other predictors. The horizonal and vertical axes on the plot are perhaps most easily understood* as "X2X2 after other predictors are accounted for" and "YY after other predictors are accounted for". You can now look at the relationship between YY and X2X2 once all other predictors have been accounted for. So for example, the slope you can see in each plot now reflects the partial regression coefficients from your original multiple regression model.

A lot of the value of an added variable plot comes at the regression diagnostic stage, especially since the residuals in the added variable plot are precisely the residuals from the original multiple regression. This means outliers and heteroskedasticity can be identified in a similar way to when looking at the plot of a simple rather than multiple regression model. Influential points can also be seen - this is useful in multiple regression since some influential points are not obvious in the original data before you take the other variables into account. In my example, a moderately large X2X2 value may not look out of place in the table of data, but if the X3X3 value is large as well despite X2X2 and X3X3 being negatively correlated then the combination is rare. "Accounting for other predictors", that X2X2 value is unusually large and will stick out more prominently on your added variable plot.

∗∗ More technically they would be the residuals from running two other multiple regressions: the residuals from regressing YY against all predictors other than X2X2 go on the vertical axis, while the residuals from regression X2X2 against all other predictors go on the horizontal axis. This is really what the legends of "YY given others" and "X2X2 given others" are telling you. Since the mean residual from both of these regressions is zero, the mean point of (X2X2 given others, YY given others) will just be (0, 0) which explains why the regression line in the added variable plot always goes through the origin. But I often find that mentioning the axes are just residuals from other regressions confuses people (unsurprising perhaps since we now are talking about four different regressions!) so I have tried not to dwell on the matter. Comprehend them as "X2X2 given others" and "YY given others" and you should be fine.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-10-7 10:06:10
oliyiyi 发表于 2016-10-7 09:42
PARTIAL REGRESSION PLOTName:
PARTIAL REGRESSION PLOTType:
Graphics CommandPurpose:
谢谢楼主分享的资料不错啊!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-10-11 13:38:06
支持一下
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-10-19 11:05:34
谢谢分享
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群