New material added to the third edition on January 3, 2017 |
Introduction to Python for Econometrics, Statistics and Numerical Analysis: Third Edition ![]()
Python is a widely used general purpose programming language, which happens to be well suited to econometrics, data analysis and other more general numeric problems. These notes provide an introduction to Python for a beginning programmer. They may also be useful for an experienced Python programmer interested in using NumPy, SciPy, matplotlib and pandas for numerical and statistical analysis (if this is the case, much of the beginning can be skipped).
Third edition update:
Second edition update:
New in second edition:
Introduction to Python for Econometrics, Statistics and Numerical Analysis: Third Edition ![]()
Code and Data for Introduction to Python for Econometrics, Statistics and Numerical Analysis ![]()
This is the code directly from the notes. It has been directly stripped from the master document, and allows for simple copy-and-paste execution.
Solutions for Introduction to Python for Econometrics, Statistics and Numerical Analysis ![]()
These solutions files contain answer to the exercises at the end of the chapters. They are formatted for IPython's Demo module, and instructions for use are located in the docstring.
Add Python to the Windows Registry
This file allows a particular Python installation to become the default by changing registry. It is useful for virtual environments and allows binary installers to be used with any location.
Example: GARCH ![]()
Example: Fama-MacBeth Regression ![]()
FTSE 1984-2012 (zipped csv) ![]()
Fama-French Data (zipped csv) ![]()
The primary library for Machine Learning in Python is scikit-learn, which has its own great tutorial page here.
If you’re wondering about the difference between statsmodels and scikit-learn, the answer is: there’s no easy answer.
statsmodels is primarily written for and by econometricians, while scikit-learn is primarily written for and by computer scientists and people doing machine learning. But the relationship between “econometrics” and “machine learning” is complicated. In very broad terms, machine learning tends to focus on prediction while econometrics tends to focus on testing hypotheses. But that’s somewhat simplistic.
The reason is that Econometrics and Machine Learning both developed when people in specific disciplines (economics and computer science respectively) branched off statistics to develop tools tailored for their own area. For several decades, econometrics and machine learning more or less developed independently and in parallel, each borrowing from statistics, but neither really paying attention to the other. As a result, there are some places where the two fields use the same tools but refer to them with different nomenclature, and other places where they actually do fundamentally different things.
Learning problems fall into a few categories:
supervised learning, in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page).This problem can be either:
- classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of a classification problem would be handwritten digit recognition, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class.
- regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.
unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization (Click here to go to the Scikit-Learn unsupervised learning page).
xuehe 发表于 2019-9-13 17:18
Python for Econometrics
Introduction to Python for Econometrics, Statistics and Numerical Analysis: ...
xuehe 发表于 2019-9-13 17:18
Python for Econometrics
Introduction to Python for Econometrics, Statistics and Numerical Analysis: ...
扫码加好友,拉您进群



收藏
