SPSS Technotes:Spline Regression with Estimated Knots in SPSS

1916

收藏 2014-04-21

Technote (troubleshooting)
Problem(Abstract)[size=1em]I want to fit a regression model where I expect the slope to shift up or down, or even change sign, at a particular value of the predictor. Also, I want that shift point to be estimated as a parameter in the model. Can this model be fitted in SPSS Statistics?

Resolving the problem[size=1em]Regression models in which the function changes at one or more points along the range of the predictor are called splines, or piecewise polynomials, and the location of these shifts are called knots. If the knots are fixed by the analyst, then splines can be fitted quite easily with the SPSS REGRESSION procedure. The properties of these splines are described in Technote 1476028, with a description of the process of fitting them with REGRESSION. If the knots are to be estimated from the data, i.e. if the knots are variable, then you can fit many such models with the SPSS NLR and CNLR procedures, as described below. These procedures perform Nonlinear Regression and Constrained Nonlinear Regression, respectively.

One property of NLR which allows us to fit variable-knot models is the facility to express multiplicative relationships between parameters. The MODEL PROGRAM command from the first example, a 3-phase linear spline model, illustrates this point.

MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25.
COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa GE knot1)
+ bc1*(xa-knot2)*(xa GE knot2).
NLR y1 with xa / pred = predex1 /save pred resid (residex1).

The parameters ba0 and ba1 are the intercept and the linear coefficient, respectively, for the predictor, XA. The parameter bb1 is the second-phase adjustment to the slope. It is applied to (xa - knot1), where knot1 is the first knot. By multiplying this model term by the logical expression (xa GE knot1), we limit the influence of bb1 to values of XA at or above the first knot).

The parameter bc1 is a further adjustment to the slope in the third phase. Its relationship to knot2 is similar to the bb1-knot1 relationship. By naming knot1 and knot2 in the MODEL PROGRAM statement, we declare them as parameters to be estimated. The numbers assigned to each of the parameters are starting values for the iterative maximum likelihood estimation process used by NLR and CNLR.

However, one constraint must be placed on variable-knot models which is not required in fixed-knot models. The model must be continuous at the knots. There cannot be a "jump", or step, in the model at the knot. This constraint simply requires the absence of a phase 2 intercept, or step parameter, to be enforced in either NLR or CNLR. Also, the examples below are restricted to models where the number of knots is fixed by the analyst, although their location is estimated from the data. Finally, if you have more than one knot in the model, you may wish to constrain 'later' knots to have higher values than 'earlier' knots. The assignment of higher starting values for 'later' knots than 'earlier' knots may be sufficient for this purpose, but in a poorly-identified model, the additional constraint may be helpful. The CNLR procedure must be used for this purpose, with the constraints expressed in the /BOUNDS subcommand. Such a constraint is illustrated in Example 1, which has 2 knot points.

[size=1em]For a discussion of spline regression models with fixed or variable knots, see Chapter 9 of :.
Seber, G.A.F., & Wild, C.J. (1989). Nonlinear regression. New York: Wiley,

Three examples of analyses of variable-knot spline regression models are shown below. An input program is presented that constructs a data set that allows you to run these commands in SPSS. The predictor in each example is the variable XA. There are 3 replicates at each value of XA. The dependent variables are Y1, Y2, and Y3 for examples 1, 2, and 3, respectively. They are generated with SPSS commands just before their respective analyses. The examples are somewhat contrived and we may have cheated slightly by using parameter starting values that equal the data-generating values for those parameters. However, these examples are merely illustrative of the process of using these commands for spline models.

******************************************.

SET SEED = 2000000.
title 'spline regression with estimated knots, good start values'.
INPUT PROGRAM.
LOOP xa = 1 TO 35.
LOOP rep = 1 TO 3.
LEAVE xa.
END case.
END LOOP.
END LOOP.
END file.
END INPUT PROGRAM.
EXECUTE.

* EXAMPLE 1
In the first example, the linear model has 2 knots,
with starting values at XA=15 and XA=25,
and is continuous at both knot points.
* y1 is dep var for example 1, with knots at 15 and 25, linear .
COMPUTE y1=3 + 3*xa + normal(2).
IF (xa gt 15) y1=y1 - 4*(xa-15).
IF (xa gt 25) y1=y1 + 2*(xa-25).
GRAPH
/SCATTERPLOT(BIVAR)=xa WITH y1 .

* Constraints not placed on knot values.
MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25.
COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa GE knot1)
+ bc1*(xa-knot2)*(xa GE knot2).
NLR y1 / PRED = predex1 /SAVE pred resid (residex1).
* Diagnostic plots of residuals, predicted values .
GRAPH
/SCATTERPLOT(BIVAR)= predex1 WITH residex1 .
GRAPH
/SCATTERPLOT(OVERLAY)=xa xa WITH y1 predex1 (PAIR) .

* Note that in SPSS versions prior to 5.0, you must include the
* predictor(s) in the NLR and CNLR commands, as in:
* NLR y1 WITH xa / PRED = predex1 /SAVE pred resid (residex1).
* Including the predictors list returns a warning in later SPSS versions,
* but the command will still run.

* Example 1 with KNOT2 constrained to be greater than KNOT1 .
MODEL PROGRAM ba0=3 ba1=3 bb1=-4 bc1=2 knot1=15 knot2=25.
COMPUTE predex1 = ba0 + ba1*xa + bb1*(xa-knot1)*(xa ge knot1)
+ bc1*(xa-knot2)*(xa ge knot2).
CNLR y1 / PRED = predex1 /SAVE pred resid (residex1)
/ BOUNDS knot2 - knot1 > 0 .
GRAPH
/SCATTERPLOT(BIVAR)= predex1 WITH residex1 .
GRAPH
/SCATTERPLOT(OVERLAY)=xa xa WITH y1 predex1 (PAIR) .

* EXAMPLE 2.
* In the second example, there is one knot at XA=15. The spline is
* continuous at the knot, but the order of the polynomial changes.
* Before the knot, the relationship is linear; after the knot, quadratic.
* Y2 is dep var for example 2, knot at 15, linear,quadratic.
COMPUTE y2 = 4 + 2*xa + normal(.5).
IF (xa gt 15) y2=y2 + (xa - 15) - .5*(xa-15)*(xa-15).
GRAPH
/SCATTERPLOT(BIVAR)=xa WITH y2 .

* In this model, we compute an intermediate term, pc2, to
* stand for the expression (xa GE knot1). This term is just
* used to avoid repetitive typing of the logical expression.
* Note that knot1 is still the parameter to be estimated.

MODEL PROGRAM ba0=4 ba1=2 bb1=1 bb2=-0.5 knot1=15 .
COMPUTE pc2 = (xa GE knot1).
COMPUTE predex2 = ba0 + ba1*xa + bb1*(xa-knot1)*pc2
+ bb2*((xa-knot1)**2)*pc2.
NLR y2 / PRED = predex2 /SAVE pred resid (residex2).

* Diagnostic plots of residuals, predicted values .
GRAPH
/SCATTERPLOT(BIVAR)= predex2 WITH residex2 .
GRAPH
/SCATTERPLOT(OVERLAY)=xa xa WITH y2 predex2 (PAIR) .

* EXAMPLE 3.
* In the third example, there is one knot at XA=15. A cubic spline is
* fitted with the constraint that all second-order and lower-order
* derivatives are equal on both sides of the knot, which is equivalent to
* saying that only the cubic term of the function is adjusted after the
* knot. The continuity constraints are gradually relaxed in subsequent NLR
* runs that introduce the lower-order XB terms into the
* model. If you examine the correlations among the parameter estimates, you
* will see that the latter 2 models are poorly identified as many of these
* correlations are very high. In the final model, some of the parameter estimates
* are perfectly correlated. These models are shown here as illustrations
* of gradually loosening constraints in the model.
COMPUTE y3 = 6 + 3.5*xa - 1.5*xa*xa + .5*xa*xa*xa + normal(.5).
IF (xa gt 15) y3 = y3 - 2*(xa-15)*(xa-15)*(xa-15).
GRAPH
/SCATTERPLOT(BIVAR)=xa WITH y3 .

MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5 bb3=-2 knot1=15 .
COMPUTE pc2 = (xa ge knot1).
COMPUTE predex3 = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa
+ bb3*((xa-knot1)**3)*pc2.
NLR y3 / PRED = predex3 /SAVE pred resid (residex3).

* Diagnostic plots of residuals, predicted values .
GRAPH
/SCATTERPLOT(BIVAR)= predex3 WITH residex3 .
GRAPH
/SCATTERPLOT(OVERLAY)=xa xa WITH y3 predex3 (PAIR) .

* example 3 but with a quadratic term added to 2nd piece .
MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5 bb2=0 bb3=-2 knot1=15 .
COMPUTE pc2 = (xa ge knot1).
COMPUTE pred = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa
+ bb2*((xa-knot1)**2)*pc2 + bb3*((xa-knot1)**3)*pc2.
NLR y3.

* example 3 but with a quadratic & linear term added to 2nd piece .
MODEL PROGRAM ba0=6 ba1=3.5 ba2=-1.5 ba3=0.5
bb1=0 bb2=0 bb3=-2 knot1=15 .
COMPUTE pc2 = (xa ge knot1).
COMPUTE pred = ba0 + ba1*xa + ba2*xa*xa + ba3*xa*xa*xa
+ bb1*(xa-knot1)*pc2 + bb2*((xa-knot1)**2)*pc2
+ bb3*((xa-knot1)**3)*pc2.
NLR y3.

Historical Number[size=1em]19430

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群