Time Series Forecast Case Study with Python

oliyiyi

5111

收藏 2017-02-13

本帖隐藏的内容

Time series forecasting is a process, and the only way to get good forecasts is to practice this process.

In this tutorial, you will discover how to forecast the number of monthly armed robberies in Boston with Python.

Working through this tutorial will provide you with a framework for the steps and the tools for working through your own time series forecasting problems.

After completing this tutorial, you will know:

How to check your Python environment and carefully define a time series forecasting problem.
How to create a test harness for evaluating models, develop a baseline forecast, and better understand your problem with the tools of time series analysis.
How to develop an autoregressive integrated moving average model, save it to file, and later load it to make predictions for new time steps.

Let’s get started.

[color=rgb(255, 255, 255) !important]

Time Series Forecast Case Study with Python – Monthly Armed Robberies in Boston
Photo by Tim Sackton, some rights reserved.

Overview

In this tutorial, we will work through a time series forecasting project from end-to-end, from downloading the dataset and defining the problem to training a final model and making predictions.

This project is not exhaustive, but shows how you can get good results quickly by working through a time series forecasting problem systematically.

The steps of this project that we will work through are as follows:

Environment.
Problem Description.
Test Harness.
Persistence.
Data Analysis.
ARIMA Models
Model Validation

This will provide a template for working through a time series prediction problem that you can use on your own dataset.

1. Environment

This tutorial assumes an installed and working SciPy environment and dependencies, including:

SciPy
NumPy
Matplotlib
Pandas
scikit-learn
statsmodels

I used Python 2.7. Are you on Python 3? let me know how you go in the comments.

This script will help you check your installed versions of these libraries.

# scipy import scipy print('scipy: {}'.format(scipy.__version__)) # numpy import numpy print('numpy: {}'.format(numpy.__version__)) # matplotlib import matplotlib print('matplotlib: {}'.format(matplotlib.__version__)) # pandas import pandas print('pandas: {}'.format(pandas.__version__)) # scikit-learn import sklearn print('sklearn: {}'.format(sklearn.__version__)) # statsmodels import statsmodels print('statsmodels: {}'.format(statsmodels.__version__))

The results on my workstation used to write this tutorial are as follows:

scipy: 0.18.1 numpy: 1.11.2 matplotlib: 1.5.3 pandas: 0.19.1 sklearn: 0.18.1 statsmodels: 0.6.1
2. Problem Description

The problem is to predict the number of monthly armed robberies in Boston, USA.

The dataset provides the number of monthly armed robberies in Boston from January 1966 to October 1975, or just under 10 years of data.

The values are a count and there are 118 observations.

The dataset is credited to McCleary & Hay (1980).

You can learn more about this dataset and download it directly from DataMarket.

Download the dataset as a CSV file and place it in your current working directory with the filename “robberies.csv“.

3. Test Harness

We must develop a test harness to investigate the data and evaluate candidate models.

This involves two steps:

Defining a Validation Dataset.
Developing a Method for Model Evaluation.

3.1 Validation Dataset

The dataset is not current. This means that we cannot easily collect updated data to validate the model.

Therefore we will pretend that it is October 1974 and withhold the last one year of data from analysis and model selection.

This final year of data will be used to validate the final model.

The code below will load the dataset as a Pandas Series and split into two, one for model development (dataset.csv) and the other for validation (validation.csv).

from pandas import Series series = Series.from_csv('robberies.csv', header=0) split_point = len(series) - 12 dataset, validation = series[0:split_point], series[split_point:] print('Dataset %d, Validation %d' % (len(dataset), len(validation))) dataset.to_csv('dataset.csv') validation.to_csv('validation.csv')

Running the example creates two files and prints the number of observations in each.

Dataset 106, Validation 12

The specific contents of these files are:

dataset.csv: Observations from January 1966 to October 1974 (106 observations)
validation.csv: Observations from November 1974 to October 1975 (12 observations)

The validation dataset is 10% of the original dataset.

Note that the saved datasets do not have a header line, therefore we do not need to cater to this when working with these files later.

3.2. Model Evaluation

Model evaluation will only be performed on the data in dataset.csv prepared in the previous section.

Model evaluation involves two elements:

Performance Measure.
Test Strategy.

3.2.1 Performance Measure

The observations are a count of robberies.

We will evaluate the performance of predictions using the root mean squared error (RMSE). This will give more weight to predictions that are grossly wrong and will have the same units as the original data.

Any transforms to the data must be reversed before the RMSE is calculated and reported to make the performance between different methods directly comparable.

We can calculate the RMSE using the helper function from the scikit-learn library mean_squared_error() that calculates the mean squared error between a list of expected values (the test set) and the list of predictions. We can then take the square root of this value to give us an RMSE score.

For example:

from sklearn.metrics import mean_squared_error from math import sqrt ... test = ... predictions = ... mse = mean_squared_error(test, predictions) rmse = sqrt(mse) print('RMSE: %.3f' % rmse)
3.2.2 Test Strategy

Candidate models will be evaluated using walk-forward validation.

This is because a rolling-forecast type model is required from the problem definition. This is where one-step forecasts are needed given all available data.

The walk-forward validation will work as follows:

The first 50% of the dataset will be held back to train the model.
The remaining 50% of the dataset will be iterated and test the model.
For each step in the test dataset:
- A model will be trained.
- A one-step prediction made and the prediction stored for later evaluation.
- The actual observation from the test dataset will be added to the training dataset for the next iteration.
The predictions made during the iteration of the test dataset will be evaluated and an RMSE score reported.

Given the small size of the data, we will allow a model to be re-trained given all available data prior to each prediction.

We can write the code for the test harness using simple NumPy and Python code.

Firstly, we can split the dataset into train and test sets directly. We’re careful to always convert a loaded dataset to float32 in case the loaded data still has some String or Integer data types.

# prepare data X = series.values X = X.astype('float32') train_size = int(len(X) * 0.50) train, test = X[0:train_size], X[train_size:]

Next, we can iterate over the time steps in the test dataset. The train dataset is stored in a Python list as we need to easily append a new observation each iteration and Numpy array concatenation feels like overkill.

The prediction made by the model is called yhat for convention, as the outcome or observation is referred to as y and yhat (a ‘y‘ with a mark above) is the mathematical notation for the prediction of the y variable.

The prediction and observation are printed each observation for a sanity check prediction in case there are issues with the model.

# walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # predict yhat = ... predictions.append(yhat) # observation obs = test history.append(obs) print('>Predicted=%.3f, Expected=%3.f' % (yhat, obs))

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

沙发

nicacc

2017-2-13 10:32:43

thank you

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

藤椅

fengyg

2017-2-13 10:41:00

kankan

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

板凳

astar55

2017-2-13 10:43:34

谢谢分享！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

报纸

yazxf

2017-2-13 13:10:13

谢谢你的书！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

地板

ekscheng

2017-2-13 13:48:57

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

7楼

mencius

2017-2-13 14:45:56

thanks.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

8楼

franky_sas

2017-2-13 14:48:27

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

9楼

kkkm_db

2017-2-13 15:01:45

谢谢分享！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

10楼

huaxuqiao

2017-2-13 16:05:25

关于python时序的书不多啊

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

11楼

lonestone

2017-2-14 06:26:13

oliyiyi 发表于 2017-2-13 10:31
**** 本内容被作者隐藏 ****
谢谢你

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

12楼

toughxiaoqiang

2017-2-14 06:46:11

谢谢分享

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

13楼

精算屋

2017-2-24 11:27:13

提示: 作者被禁止或删除内容自动屏蔽

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

14楼

phipe

2017-2-27 21:51:05

谢谢分享

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

15楼

67890

2017-4-23 04:21:28

kankan

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

16楼

wenwendog

2017-5-2 10:57:21

thanks

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

17楼

Kaochihyu

2017-5-9 12:21:39

感動！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

18楼

krantson

2017-5-9 21:01:36

xieixe

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

19楼

thomascat

2017-6-13 07:48:46

thanks thanks

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

20楼

thomascat

2017-6-18 23:24:10

thanks thanks thanks

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

21楼

kile31920

2017-7-13 21:50:21

谢谢你的书！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

22楼

vincent0306

2017-8-8 14:09:13

谢谢分享！！！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

23楼

sigmund

2018-11-24 08:43:22

kan yi kan

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

24楼

gadzarts

2018-11-29 17:13:48

这是什么？书？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

25楼

tianshi_426

2019-3-4 07:08:01

这是什么书啊？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

相关推荐

[下载]applied time series and forecast

SAS for Forecasting Time Series

SAS时间序列分析原版英文书SAS® for Forecasting Time Series, Second Edition

用SAS Time Series Forecasting System是预测N个值都相同的问题

SAS ebook: SAS® for Forecasting Time Series, Second Edition

Time Series Forecasting in SAS

[Case Study]Extracting Time Series from Large Data Sets

Top Books on Time Series Forecasting With R

How to Work Through a Time Series Forecast Project

Beginner’s Guide to Create a Time Series Forecast (with Python)

栏目导航

LATEX论坛

经管高考

金融学（理论版）

休闲灌水

悬赏大厅

人力资源管理

热门文章

表格结构数据特征与CDA数据分析师：精准适配 ...

新宏观丨豆包，传统经济学与商学对全球性债 ...

几何（第五卷）[法] M. 贝尔热

几何（第四卷）[法] M. 贝尔热

【中国电信】2025年云计算研究白皮书

问卷填写，每份50个论坛币

奇瑞QQ焕新归来

房地产行业：2026年，年轻人应该先买车还是 ...

普华永道 - 中国影响力报告2025

CDA数据分析脱产就业班于2026年3月7日开班！ ...

推荐文章

2026JG学术冬训营:从Stata初高到Python机器 ...

【必看】【本版版规，欢迎发悬赏贴求助】

26年寒假天津站｜Gemini论文写作&数据分析 ...

关于如何利用文献的若干建议

关于学术研究和论文发表的一些建议

关于科研中如何学习基础知识的一些建议 (一 ...

一个自编的经济学建模小案例 --写给授课本科 ...

AI智能体赋能教学改革: 全国AI教育教学应用 ...

2025中国AIoT产业全景图谱报告-406页

关于文献求助的一些建议

本帖隐藏的内容

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群