全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件
4103 18
2019-09-13
Python for Econometrics

New material added to the third edition on January 3, 2017

Introduction to Python for Econometrics, Statistics and Numerical Analysis: Third Edition

Python is a widely used general purpose programming language, which happens to be well suited to econometrics, data analysis and other more general numeric problems. These notes provide an introduction to Python for a beginning programmer. They may also be useful for an experienced Python programmer interested in using NumPy, SciPy, matplotlib and pandas for numerical and statistical analysis (if this is the case, much of the beginning can be skipped).

Third edition update:

  • Rewritten installation section focused exclusively on using Continuum's Anaconda.
  • Python 3.5 is the default version of Python instead of 2.7. Python 3.5 (or newer) is well supported by the Python packages required to analyze data and perform statistical analysis, and bring some new useful features, such as a new operator for matrix multiplication (@).
  • Removed distinction between integers and longs in built-in data types chapter. This distinction is only relevant for Python 2.7.
  • dot has been removed from most examples and replaced with @ to produce more readable code.
  • Split Cython and Numba into separate chapters to highlight the improved capabilities of Numba.
  • Verified all code working on current versions of core libraries using Python 3.5.
  • pandas
    • Updated syntax of pandas functions such as resample.
    • Added pandas Categorical.
    • Expanded coverage of pandas groupby.
    • Expanded coverage of date and time data types and functions.
  • New chapter introducing statsmodels, a package that facilitates statistical analysis of data. statsmodels includes regression analysis, Generalized Linear Models (GLM) and time-series analysis using ARIMA models.

Second edition update:

  • Improved Cython and Numba sections
  • Added sections discussing interfacing with C code
  • Added sections to the chapter on running code in Parallel covering IPython's cluster server and joblib
  • Further improvements in the installation based on feedback from the Python Course
  • Updated Anaconda to 1.9
  • Added information about using Spyder as an initial IDE.
  • Added packages for Spyder to the installation instructions.

New in second edition:

  • The preferred installation method is now Continuum Analytics' Anaconda. Anaconda is a complete scientific stack and is available for all major platforms.
  • New chapter on pandas. pandas provides a simple but powerful tool to manage data and perform basic analysis. It also greatly simplifies importing and exporting data.
  • New chapter on advanced selection of elements from an array.
  • Numba provides just-in-time compilation for numeric Python code which often produces large performance gains when pure NumPy solutions are not available (e.g. looping code).
  • Addition to performance section covering line_profiler for profiling code.
  • Dictionary, set and tuple comprehensions.
  • Numerous typos fixed.
  • All code has been verified working against Anaconda 1.7.0.
Notes

Introduction to Python for Econometrics, Statistics and Numerical Analysis: Third Edition

Code

Code and Data for Introduction to Python for Econometrics, Statistics and Numerical Analysis
This is the code directly from the notes. It has been directly stripped from the master document, and allows for simple copy-and-paste execution.

Solutions for Introduction to Python for Econometrics, Statistics and Numerical Analysis
These solutions files contain answer to the exercises at the end of the chapters. They are formatted for IPython's Demo module, and instructions for use are located in the docstring.

Add Python to the Windows Registry
This file allows a particular Python installation to become the default by changing registry. It is useful for virtual environments and allows binary installers to be used with any location.

IPython Notebooks

Example: GARCH
Example: Fama-MacBeth Regression

Data

FTSE 1984-2012 (zipped csv)
Fama-French Data (zipped csv)

Video DemonstrationsSetupIPythonPython 爬虫分析2019年杭州国庆工作坊 & 课题申报高级研修
  • Core IPython - Key features of the IPython console including syntax highlighting, autocompletion, the command history and cell model.
  • IPython Magics - Magic keywords provide a wide range of features including on-the-fly configuration changes, file system manipulation, running Python programs and timing code.
  • Configuring IPython - Coming Soon. A brief introduction to customizing the IPython environment using configuration files.
IPython Notebook



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2019-9-13 17:22:25
Data Analysis in Python
Navigation
Why Python?
Note to R Users
Note to Stata Users
1. Setting Up Python
2. Basic Python
3. Pandas
4. Installing Packages
Econometrics
Machine learning
Plotting
GIS in Python
Network Analysis
Making Python faster
Big Data / Parallelization
Text Analysis
Getting Help
Teaching Programming
R-to-Python Table
ST: iPython
ST: Command Line
ST: Git and Github
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-9-13 17:24:13
Machine learning

The primary library for Machine Learning in Python is scikit-learn, which has its own great tutorial page here.

If you’re wondering about the difference between statsmodels and scikit-learn, the answer is: there’s no easy answer.

statsmodels is primarily written for and by econometricians, while scikit-learn is primarily written for and by computer scientists and people doing machine learning. But the relationship between “econometrics” and “machine learning” is complicated. In very broad terms, machine learning tends to focus on prediction while econometrics tends to focus on testing hypotheses. But that’s somewhat simplistic.

The reason is that Econometrics and Machine Learning both developed when people in specific disciplines (economics and computer science respectively) branched off statistics to develop tools tailored for their own area. For several decades, econometrics and machine learning more or less developed independently and in parallel, each borrowing from statistics, but neither really paying attention to the other. As a result, there are some places where the two fields use the same tools but refer to them with different nomenclature, and other places where they actually do fundamentally different things.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-9-13 17:29:03
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-9-13 17:43:22

In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), it is said to have several attributes or features.

Learning problems fall into a few categories:

  • supervised learning, in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page).This problem can be either:

    • classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of a classification problem would be handwritten digit recognition, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class.
    • regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.
  • unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization (Click here to go to the Scikit-Learn unsupervised learning page).



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-9-13 18:05:53
超值资料倾情奉送啊
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群