Table of Contents Preface 1 Chapter 1: Getting Started with Python Libraries 9 Software used in this book 10
Installing software and setup 10
On Windows 10
On Linux 12
On Mac OS X 13 Building NumPy SciPy, matplotlib, and IPython from source 14 Installing with setuptools 15 NumPy arrays 16 A simple application 16 Using IPython as a shell 19 Reading manual pages 22 IPython notebooks 22 Where to find help and references 23 Summary 23 Chapter 2: NumPy Arrays 25 The NumPy array object 25
The advantages of NumPy arrays 26 Creating a multidimensional array 27 Selecting NumPy array elements 27 NumPy numerical types 28
Data type objects 30
Character codes 30
The dtype constructors 31
The dtype attributes 31
One-dimensional slicing and indexing 32 Manipulating array shapes 32
Stacking arrays 35
Splitting NumPy arrays 39
NumPy array attributes 41
Converting arrays 48 Creating array views and copies 48 Fancy indexing 50 Indexing with a list of locations 52 Indexing NumPy arrays with Booleans 53 Broadcasting NumPy arrays 55 Summary 58 Chapter 3: Statistics and Linear Algebra 59 NumPy and SciPy modules 59 Basic descriptive statistics with NumPy 63 Linear algebra with NumPy 66
Inverting matrices with NumPy 66
Solving linear systems with NumPy 68 Finding eigenvalues and eigenvectors with NumPy 69 NumPy random numbers 71
Gambling with the binomial distribution 72
Sampling the normal distribution 74
Performing a normality test with SciPy 75 Creating a NumPy-masked array 78
Disregarding negative and extreme values 80 Summary 83 Chapter 4: pandas Primer 85 Installing and exploring pandas 86 pandas DataFrames 87 pandas Series 90 Querying data in pandas 94 Statistics with pandas DataFrames 97 Data aggregation with pandas DataFrames 99 Concatenating and appending DataFrames 103 Joining DataFrames 105 Handling missing values 108 Dealing with dates 110 Pivot tables 113 Remote data access 114 Summary 117
Chapter 5: Retrieving, Processing, and Storing Data 119 Writing CSV files with NumPy and pandas 120 Comparing the NumPy .npy binary format and pickling pandas DataFrames 122 Storing data with PyTables 124 Reading and writing pandas DataFrames to HDF5 stores 126 Reading and writing to Excel with pandas 129 Using REST web services and JSON 131 Reading and writing JSON with pandas 132 Parsing RSS and Atom feeds 134 Parsing HTML with Beautiful Soup 135 Summary 142 Chapter 6: Data Visualization 143 matplotlib subpackages 144 Basic matplotlib plots 144 Logarithmic plots 146 Scatter plots 148 Legends and annotations 150 Three-dimensional plots 153 Plotting in pandas 155 Lag plots 158 Autocorrelation plots 159 Plot.ly 160 Summary 163 Chapter 7: Signal Processing and Time Series 165 statsmodels subpackages 166 Moving averages 167 Window functions 168 Defining cointegration 170 Autocorrelation 173 Autoregressive models 176 ARMA models 179 Generating periodic signals 181 Fourier analysis 184 Spectral analysis 186 Filtering 187 Summary 189 Chapter 8: Working with Databases 191 Lightweight access with sqlite3 192 Accessing databases from pandas 194
SQLAlchemy 196
Installing and setting up SQLAlchemy 196
Populating a database with SQLAlchemy 198
Querying the database with SQLAlchemy 200 Pony ORM 201 Dataset – databases for lazy people 202 PyMongo and MongoDB 204 Storing data in Redis 206 Apache Cassandra 207 Summary 210 Chapter 9: Analyzing Textual Data and Social Media 211 Installing NLTK 212 Filtering out stopwords, names, and numbers 214 The bag-of-words model 216 Analyzing word frequencies 217 Naive Bayes classification 219 Sentiment analysis 222 Creating word clouds 225 Social network analysis 230 Summary 232 Chapter 10: Predictive Analytics and Machine Learning 233 A tour of scikit-learn 235 Preprocessing 236 Classification with logistic regression 238 Classification with support vector machines 240 Regression with ElasticNetCV 242 Support vector regression 245 Clustering with affinity propagation 248 Mean Shift 250 Genetic algorithms 252 Neural networks 257 Decision trees 259 Summary 261 Chapter 11: Environments Outside the Python Ecosystem and Cloud Computing 263 Exchanging information with MATLAB/Octave 264 Installing rpy2 265 Interfacing with R 265 Sending NumPy arrays to Java 268 Integrating SWIG and NumPy 269
Integrating Boost and Python 272 Using Fortran code through f2py 274 Setting up Google App Engine 275 Running programs on PythonAnywhere 276 Working with Wakari 277 Summary 278 Chapter 12: Performance Tuning, Profiling, and Concurrency 279 Profiling the code 280 Installing Cython 284 Calling C code 288 Creating a process pool with multiprocessing 290 Speeding up embarrassingly parallel for loops with Joblib 293 Comparing Bottleneck to NumPy functions 294 Performing MapReduce with Jug 296 Installing MPI for Python 298 IPython Parallel 299 Summary 303 Appendix A: Key Concepts 305 Appendix B: Useful Functions 311 matplotlib 311 NumPy 312 pandas 313 Scikit-learn 314 SciPy 315
scipy.fftpack 315
scipy.signal 315
scipy.stats 315 Appendix C: Online Resources 31