Think Stats: Probability and Statistics for Programmers
By Allen B. Downey
Publisher: O'Reilly Media
Final Release Date: July 2011
Pages: 138
If you know how to program, you have the skills to turn data into knowledge using the tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python.
You'll work with a case study throughout the book to help you learn the entire data analysis process—from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.
- Develop your understanding of probability and statistics by writing and testing code
- Run experiments to test statistical behavior, such as generating samples from several distributions
- Use simulations to understand concepts that are hard to grasp mathematically
- Learn topics not usually covered in an introductory course, such as Bayesian estimation
- Import data from almost any source using Python, rather than be limited to data that has been cleaned and formatted for statistics tools
- Use statistical inference to answer questions about real-world data
Table of Contents
Chapter 1 Statistical Thinking for Programmers
Do First Babies Arrive Late?
A Statistical Approach
The National Survey of Family Growth
Tables and Records
Significance
Glossary
Chapter 2 Descriptive Statistics
Means and Averages
Variance
Distributions
Representing Histograms
Plotting Histograms
Representing PMFs
Plotting PMFs
Outliers
Other Visualizations
Relative Risk
Conditional Probability
Reporting Results
Glossary
Chapter 3 Cumulative Distribution Functions
The Class Size Paradox
The Limits of PMFs
Percentiles
Cumulative Distribution Functions
Representing CDFs
Back to the Survey Data
Conditional Distributions
Random Numbers
Summary Statistics Revisited
Glossary
Chapter 4 Continuous Distributions
The Exponential Distribution
The Pareto Distribution
The Normal Distribution
Normal Probability Plot
The Lognormal Distribution
Why Model?
Generating Random Numbers
Glossary
Chapter 5 Probability
Rules of Probability
Monty Hall
Poincaré
Another Rule of Probability
Binomial Distribution
Streaks and Hot Spots
Bayes’s Theorem
Glossary
Chapter 6 Operations on Distributions
Skewness
Random Variables
PDFs
Convolution
Why Normal?
Central Limit Theorem
The Distribution Framework
Glossary
Chapter 7 Hypothesis Testing
Testing a Difference in Means
Choosing a Threshold
Defining the Effect
Interpreting the Result
Cross-Validation
Reporting Bayesian Probabilities
Chi-Square Test
Efficient Resampling
Power
Glossary
Chapter 8 Estimation
The Estimation Game
Guess the Variance
Understanding Errors
Exponential Distributions
Confidence Intervals
Bayesian Estimation
Implementing Bayesian Estimation
Censored Data
The Locomotive Problem
Glossary
Chapter 9 Correlation
Standard Scores
Covariance
Correlation
Making Scatterplots in Pyplot
Spearman’s Rank Correlation
Least Squares Fit
Goodness of Fit
Correlation and Causation
Glossary
Colophon