
&&
&
&
Many economic and business variables are affected by seasonal factors.
For example, power usage is highest in the months when temperatures are
most extreme. The most common type of seasonality is variation due to
the time of year, but other types of seasonality are also found in time
series data.
Seasonal models are often multiplicative rather than
additive. A multiplicative model includes the product of one or more
nonseasonal parameters with one or more seasonal parameters. For
example, a multiplicative model with both autoregressive and moving
average terms (an ARMA model) and with yearly seasonality for a time
series, yt, can be written as:
&
&
&&
&
& where
&
&&
&
&& is the intercept parameter.
&
&&
&
&& is the nonseasonal first-order autoregressive parameter.
&
&&
&
&& is the seasonal autoregressive parameter.
&
&&
&
&& is the nonseasonal first-order moving average parameter.
&
&&
&
&& is the seasonal moving average parameter.
&
&
To identify a seasonal model, you need to examine the autocorrelation
function (ACF) and the inverse autocorrelation function (IACF) plots.
For multiplicative MA processes, there are small spikes in the ACF plot
q lags before and after the seasonal lag, where q
is the number of nonseasonal MA parameters necessary to model the data.
These small spikes are usually in the opposite direction of the
seasonal spike. For example, a multiplicative MA(1, 12) process
typically has small spikes at lags 11 and 13 on either side of, and in
the opposite direction of, a large spike at lag 12.
&
& An additive MA process typically has small spikes q lags before the seasonal lag, where
& q is the number of nonseasonal MA parameters necessary to model the data. For example,
& an additive MA(1, 12) process typically has a small spike at lag 11 and a larger spike at lag 12.
&
&
To identify an AR process, look for the patterns described previously
in the IACF plot rather than in the ACF plot. If a process contains
both AR and MA components, the patterns may appear in both the ACF and
IACF plots.
&
& This example develops an ARMA model for steel shipments from U.S. steel mills.
&
&
& Analysis
&
The identification and estimation of Autoregressive Integrated Moving
Average (ARIMA) models is more of an art than a science. Generally, the
most parsimonious model fitting the data is considered the best. This
example uses steel shipments data taken from Metal Statistics 1993. The
values represent monthly totals of steel products shipped from U.S.
steel mills, in thousands of net tons, for the period from January 1984
to December 1991. The following statements create the data set STEEL.
&
data steel;& input date:monyy5. steelshp @@;& format date monyy5.;& title 'U.S. Steel Shipments Data';& title2 '(thousands of net tons)';& datalines; JAN84 5980 FEB84 6150 MAR84 7240 APR84 6472 MAY84 6948 JUN84 6686 JUL84 5820 AUG84 6033 SEP84 5454 OCT84 6087 NOV84 5317 DEC84 4867 ... more data lines ... ;
&
The analysis performed by the ARIMA procedure
is divided into three stages, corresponding to the stages described by
Box and Jenkins (1976). The IDENTIFY, ESTIMATE, and FORECAST statements
perform these three stages. In the identification stage, you use the
IDENTIFY statement to specify the response series and identify
candidate ARIMA models for it. The IDENTIFY statement reads time series
that are to be used in later statements, possibly differencing them,
and computes autocorrelations, inverse autocorrelations, partial
autocorrelations, and cross correlations. The analysis of this output
usually suggests one or more ARIMA models that could be fit. The VAR=
option specifies the variable to be identified.
proc arima data=steel;& i var=steelshp; run;
&
&&
&
&&
U.S. Steel Shipments Data
(thousands of net tons)
&&
&&
&&
&&
&&
The ARIMA Procedure
&&
&&
&&
&&
&&
Autocorrelations
Lag
Covariance
Correlation
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
Std Error
0
406442
1.00000
|&&|********************|
0
1
262630
0.64617
|&&. |*************& |
0.102062
2
261597
0.64363
|& .&|*************& |
0.138258
3
235909
0.58042
|& .& |************& |
0.166570
4
168515
0.41461
|& .& |********& |
0.186451
5
201896
0.49674
|& .& |**********& |
0.195820
6
129000
0.31739
|& .& |****** .& |
0.208533
7
152701
0.37570
|&.& |********.&|
0.213506
8
113117
0.27831
|&.& |****** .&|
0.220285
9
127532
0.31378
|&.& |****** .&|
0.223918
10
137000
0.33707
|&.& |******* .&|
0.228452
11
130723
0.32163
|&.& |****** .&|
0.233575
12
200408
0.49308
|& .&|**********& |
0.238144
13
112496
0.27678
|& .&|****** .& |
0.248551
14
135119
0.33244
|& .&|******* .& |
0.251741
15
103295
0.25414
|& .&|***** .& |
0.256273
16
62982.090
0.15496
|& .&|***& .& |
0.258885
17
108381
0.26666
|& .&|***** .& |
0.259850
18
42836.479
0.10539
|&.& |**& .&|
0.262685
19
65840.039
0.16199
|&.& |***& .&|
0.263125
20
37765.859
0.09292
|&.& |**& .&|
0.264162
21
27790.106
0.06837
|&.& |*&.&|
0.264502
22
40303.846
0.09916
|&.& |**& .&|
0.264686
23
46097.710
0.11342
|&.& |**& .&|
0.265073
24
76317.464
0.18777
|&.& |****& .&|
0.265578
&&
&&
&&
&&
&&
"." marks two standard errors
&&
&&
&&
&&
The large spike at lag 12 in the ACF plot provides evidence that the
steel shipments time series has a seasonal autoregressive component.
The lack of a large spike at lag 24 indicates that the series is
stationary at the seasonal level.
&
&
&&
&
&&
U.S. Steel Shipments Data
(thousands of net tons)
&&
&&
&&
&&
&&
The ARIMA Procedure
&&
&&
&&
&&
&&
Inverse Autocorrelations
Lag
Correlation
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1
-0.37291
|& *******| .&&|
2
0.08136
|&&. |** .&&|
3
-0.31032
|& ******| .&&|
4
0.16197
|&&. |***.&&|
5
-0.20750
|&&****| .&&|
6
0.16115
|&&. |***.&&|
7
-0.02341
|&&. | .&&|
8
0.06910
|&&. |* .&&|
9
0.00628
|&&. | .&&|
10
0.02046
|&&. | .&&|
11
0.02875
|&&. |* .&&|
12
-0.23279
|& *****| .&&|
13
0.03755
|&&. |* .&&|
14
0.04050
|&&. |* .&&|
15
0.03498
|&&. |* .&&|
16
0.09969
|&&. |** .&&|
17
-0.10703
|&&. **| .&&|
18
0.04901
|&&. |* .&&|
19
-0.08634
|&&. **| .&&|
20
0.02281
|&&. | .&&|
21
0.00844
|&&. | .&&|
22
0.10510
|&&. |** .&&|
23
-0.10923
|&&. **| .&&|
24
0.02676
|&&. |* .&&|
&&
&&
The spikes at lags 1 and 3 in the IACF plot indicate that other
components are necessary to fit an adequate model. The null hypothesis
of white noise residuals is resoundingly rejected.
&
&
&&
&
&&
U.S. Steel Shipments Data
(thousands of net tons)
&&
&&
&&
&&
&&
The ARIMA Procedure
&&
&&
&&
&&
&&
Autocorrelation Check for White Noise
To Lag
Chi-Square
DF
Pr > ChiSq
Autocorrelations
6
170.51
6
<.0001
0.646
0.644
0.580
0.415
0.497
0.317
12
255.47
12
<.0001
0.376
0.278
0.314
0.337
0.322
0.493
18
296.96
18
<.0001
0.277
0.332
0.254
0.155
0.267
0.105
24
309.34
24
<.0001
0.162
0.093
0.068
0.099
0.113
0.188
&&
&&
& In the estimation and diagnostic checking
stage, you use the ESTIMATE statement to specify the ARIMA model to fit
to the variable specified in the previous IDENTIFY statement and to
estimate the parameters of that model. The ESTIMATE statement also
produces diagnostic statistics to help you judge the adequacy of the
model.
Significance tests for parameter estimates indicate
whether some terms in the model may be unnecessary. Goodness-of-fit
statistics aid in comparing this model to others. Tests for white noise
residuals indicate whether the residual series contains additional
information that might be used by a more complex model. If the
diagnostic tests indicate problems with the model, you try another
model, then repeat the estimation and diagnostic checking stage.
&
The following statement fits a
seasonal ARMA model to the time series. In the syntax of the ESTIMATE
statement, the two multiplicative AR terms, denoted by the P= option,
are enclosed in separate parentheses. The two additive MA terms,
denoted by the Q= option, are separated by a space within a single set
of parentheses.
& e p=(2)(12) q=(1 3); run;
&
&&
&
&&
U.S. Steel Shipments Data
(thousands of net tons)
&&
&&
&&
&&
&&
The ARIMA Procedure
&&
&&
&&
&&
&&
Autocorrelation Check of Residuals
To Lag
Chi-Square
DF
Pr > ChiSq
Autocorrelations
6
2.42
2
0.2979
-0.009
-0.051
0.071
0.070
0.104
0.018
12
3.63
8
0.8891
-0.084
0.032
-0.024
0.013
-0.033
-0.035
18
11.86
14
0.6176
-0.082
0.168
0.014
-0.137
0.107
0.073
24
16.16
20
0.7066
0.023
0.019
-0.010
-0.047
0.174
-0.000
&&
&&
&
&&
Model for variable steelshp
Estimated Mean
6057.122
&&
&&
&
&&
Autoregressive Factors
Factor 1:
1 - 0.54234 B**(2)
Factor 2:
1 - 0.64802 B**(12)
&&
&&
&
&&
Moving Average Factors
Factor 1:
1 + 0.55505 B**(1) + 0.43689 B**(3)
&&
&&
The Autocorrelation Check of Residuals shows that none of the
Q-statistics are statistically significant. This indicates that the
model provides an adequate fit to the data.
&
&
&&
&
&&
U.S. Steel Shipments Data
(thousands of net tons)
&&
&&
&&
&&
&&
The ARIMA Procedure
&&
&&
&&
&&
&&
Conditional Least Squares Estimation
Parameter
Estimate
Standard Error
t Value
Approx
Pr > |t|
Lag
MU
6057.1
232.96713
26.00
<.0001
0
MA1,1
-0.55505
0.08021
-6.92
<.0001
1
MA1,2
-0.43689
0.07936
-5.51
<.0001
3
AR1,1
0.54234
0.09903
5.48
<.0001
2
AR2,1
0.64802
0.09392
6.90
<.0001
12
&&
&&
&
&&
Constant Estimate
975.7391
Variance Estimate
126334.1
Std Error Estimate
355.4351
AIC
1404.983
SBC
1417.805
Number of Residuals
96
&&
&&
& All of the estimated parameters have relatively large t-statistics, which indicates that
& these parameters cannot be omitted from the model.
&
In the forecasting stage, you use the FORECAST statement to forecast
future values of the time series and to generate confidence intervals
for these forecasts from the ARIMA model produced by the preceding
ESTIMATE statement.
&
The following statements produce
forecasts and upper and lower 95% confidence limits for 12 future
periods and creates the output data set STEEL2.
& f lead=12& out=steel2& id=date& interval=month& noprint; run;
&
To prepare the output data set for plotting,
change the values for the forecasts and confidence limits to missing
for all dates prior to the future forecast periods.
data steel3;& set steel2;& if date lt '01jan92'd then do;& forecast=.;& l95=.;& u95=.;& end; run;
&
& Use the GPLOT procedure to plot the data.
&
proc gplot data=steel3;& format date year4.;& plot steelshp*date=1& forecast*date=2& l95*date=3& u95*date=3 / overlay cframe=ligrnbsp; haxis=axis1 vaxis=axis2nbsp; vminor=1 href='01jan92'd;& title 'U.S. Steel Shipments Data';& title2 '(thousands of net tons)';& axis1 offset=(1 cm)& label=('Year') minor=none& order=('01jan84'd to '01jan93'd by year);& axis2 label=(angle=90 'Steel Shipments')& order=(4500 to 8500 by 1000);& symbol1 c=blue i=join l=1 v=star;& symbol2 c=red i=join l=1 v=F;& symbol3 c=green i=join l=20; run; quit;
&
&&
&&

The values of the original steel shipments time series are plotted with
the star symbol. The forecasts are plotted with the F symbol, and the
upper and lower 95% confidence limits for the forecasts are plotted
with dashed lines.
Because the model fit to the steel shipments data
includes a seasonal component, the forecasts do not follow a simple
linear trend. Instead, the forecasts show variability due to the season
(month of the year).
[此贴子已经被angelboy于2008-6-5 10:32:01编辑过]
扫码加好友,拉您进群



收藏
