Instant Data Intensive Apps with pandas How-to

ReneeBK

4969

收藏 2015-01-05

悬赏 1 个论坛币已解决

Trent Hauck
May 2013
Manipulate, visualize, and analyze your data with pandas
Book Description
Publication Date: May 23, 2013
In Detail
Pandas helps to alleviate a genuinely complex situation in data analytics libraries. Many incumbent languages aren't approachable or are fairly unproductive in general computing tasks in comparison to Python. However with Pandas it's easy to begin working with tabular datasets in a language that's easier to learn and use.
Instant Data Intensive Apps with Pandas How-to starts with Pandas’ functionalities such as joining datasets, cleaning data, and other data munging tasks. It quickly moves onto building a data reporting tool, which consists of analysis in Pandas to determine what’s relevant and present that relevant data in an easy-to-consume manner.
Instant Data Intensive Apps with Pandas How-to starts with data manipulation and other practical tasks for a fundamental understanding, and through successive recipes you will gain a more profitable understanding of Pandas.
Throughout this book the recipes are presented in a structured way. It starts with data transformation techniques, but builds up to more complex examples such as performing statistical analysis and integrating Pandas objects with web applications. The other recipes cover visualization and machine learning, among other things.
Instant Data Intensive Apps with Pandas How-to will get the reader up and running quickly with Pandas and put the user in a position to move up the learning curve faster.
Approach
Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. This book has a practical approach with step-by-step recipes to help readers get to grips with Pandas.
Who this book is for
Users of other data analysis tools will find value in seeing tasks they commonly encounter translated to Pandas and users of Python will encounter an introduction to a very impressive tool in a syntax they inherently know. In terms of general skills, it is assumed that the reader understands basic data structures such as arrays or lists dictionaries or hash map as well as having some understanding of command line work. Installing Pandas is not covered, but the online documentation is straightforward. Also, readers are encouraged to use IPython to interact and experiment with the code

Product Details

File Size: 594 KB
Print Length: 50 pages
Publisher: Packt Publishing (May 23, 2013)
Sold by: Amazon Digital Services, Inc.
Language: English
ASIN: B00CZ6Y0QW
Text-to-Speech: Enabled

最佳答案

tigerwolf 查看完整内容

**** 本内容被作者隐藏 ****

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

tigerwolf

2015-1-5 02:28:40

本帖隐藏的内容

Instant Data Intensive Apps with Pandas How-to (PacktPub 2013) Trent Hauck.rar
大小:(924.45 KB)

只需: 20 个论坛币马上下载

本附件包括：

Instant Data Intensive Apps with Pandas How-to (PacktPub 2013) Trent Hauck.pdf

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2015-1-5 02:30:53

Working with files (Simple)

In this recipe we'll introduce the pandas DataFrame by doing some quick exercises, then move onto one of the most fundamental parts of data analysis; getting data in and out of files.

Getting ready

Most of the rest of the book is working with data once it's in a pandas data structure, but this recipe is about those structures themselves and getting data in and out of them. Open your interpreter, preferably IPython.

How to do it...

Create an incredibly simple DataFrame to start with. A DataFrame can handle lists, NumPy arrays, dicts of strings, and more.
复制代码
The first example is too simple, and isn't useful. Add some column headers and index for more information about the DataFrame.
复制代码
复制代码
复制代码
复制代码
复制代码

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

How it works...

Most of the file input and output in pandas is the orchestration behind the scenes of formatting the value outputs, and then writing those values to a file. There are many options for formatting file output. The to_csv method takes many parameters. Some of the more common parameters are as follows:

sep: It specifies the value to separate with, in the output file
index: It is a Boolean that decides whether or not to print the index
na_rep: It specifies what to substitute for the na values

The following snippet writes the DataFrame df and writes it to a file called file.tsv, and it's formatted according to the parameters passed to the method.

复制代码

There's more...

In addition to standard file input and output functionalities, pandas has several built-in niceties.

Parsing dates at file read time

Using Panda's sophisticated date parser, a CSV can read and parse dates at the same time, as shown in the following command line:

复制代码

Besides the parsing capabilities, pandas also has a very handy date_range function, which returns a range of dates determined by the inputs. For example, it's very easy to get the months of 2012 in a series. This is shown in the following command line:

复制代码

Accessing data from a public source

pandas can also read CSV data from the Web, assuming http://www.example.com/data.csv is the URL. Take a look at the following example:

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2015-1-5 02:34:26

Slicing pandas objects (Simple)

In this recipe we'll walk through some basic functionalities about slicing pandas objects. If you're familiar with array slicing, this will be very familiar to you, but with a few idiosyncrasies for pandas.

Getting ready

Open up your interpreter, and execute the following steps:

How to do it...

Create a simple DataFrame to explore the different slicing abilities of pandas.
复制代码
Select the first two rows of the column named 'one'.
复制代码
Pass an array of column names instead of 'one'.
复制代码
Use a negative index to navigate backwards through the DataFrame.
复制代码
Select every fifth row from the DataFrame df.
复制代码
Use the head and tail functions to easily select the top and bottom of the DataFrame.
复制代码

How it works...

At some level, pandas objects behave similar to NumPy arrays; they are after all abstractions built on top of them. However, because we have more metadata about the data structures we can use that to our advantage.

After the initial pandas object is created, simple slicing occurs according to the following structure:

复制代码

Here column names is a string (or an array, if multiple columns) and rows is the number of rows that we wish to use.

There's more...

The methods that have already been described are very useful at a higher level, but there are more granular operations available.

Direct index access

The .ix command is an advanced method for selecting and slicing a DataFrame. Taking the sample from the preceding example, df.ix[1:3 ,[ 'one', 'two']] = 10will not only select the specified subset of the data, but also set its value equal to 10. The .xs command has a more explicit interface for working with indexes.

Resetting the index

Often, the index of the DataFrame becomes out of alignment when slicing data. In pandas, the easiest way to reset an index is with the reset_index() method of the DataFrame object.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

nkunku

2015-1-5 03:03:22

Using indexes to manipulate objects (Medium)

Indexes are not advanced because they're difficult, but if we want to be an expert with pandas it is important that we use them well. We will discuss hierarchical indexes in the following There's more... section.

Getting ready

A good understanding of indexes in pandas is crucial to quickly move the data around. From a business intelligence perspective, they create a distinction similar to that of metrics and dimensions in an OLAP cube. To illustrate this point, this recipe walks through getting stock data out of pandas, combining it, then reindexing it for easy chomping.

How to do it...

Use the DataReader object to transfer stock price information into a DataFrame and to explore the basic axis of Panel.
复制代码
Use the axis selectors to easily compute different sets of summary statistics.
复制代码
# major axis is sliceable as well> day_slice = pan.major_axis[1]> pan.major_xs(day_slice)[['gs', 'ba']]
Perform the analogous operations as in the preceding examples on the newly created DataFrame.
复制代码

How it works...

The previous example was certainly contrived, but when indexing and statistical techniques are incorporated, the power of pandas begins to come through. Statistics will be covered in an upcoming recipe.

pandas' indexes by themselves can be thought of as descriptors of a certain point in the DataFrame. When ticker and timestamp are the only indexes in a DataFrame, then the point is individualized by the ticker, timestamp, and column name. After the point is individualized, it's more convenient for aggregation and analysis.

There's more...

Indexes show up all over the place in pandas so it's worthwhile to see some other use cases as well.

Advanced header indexes

Hierarchical indexing isn't limited to rows. Headers can also be represented by MultiIndex, as shown in the following command line:

复制代码

Performing aggregate operations with indexes

As a prelude to the following sections, we'll do a single groupby function here since they work with indexes so well.

复制代码

This answers the question for each ticker and for each day (not date), that is, what was the mean volume over the life of the data.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Lisrelchen

2015-1-5 03:03:33

Working with dates (Medium)

In this recipe we'll talk about working with dates in pandas. Because pandas was initially written with financial time series, it has a lot of out of the box date functionalities.

Getting ready

Open up your interpreter and follow the command progression in the following section. Difficult financial analysis was the mother of pandas creation; therefore, it has many efficient and easy ways for dealing with dates.

How to do it...

Let's examine the date_range functionality within pandas.
复制代码
Create a time series and slice it by passing a range of dates to Series.
复制代码

How it works...

The date_range function is defined by dates and frequencies. See the following section for the various frequency designations. The easiest way is to define a start date, end date, and frequency, but there are other ways as well. You can also change the frequency, or resample to a smaller or larger time interval.

There's more...

pandas adds a lot more functionalities to handle dates. These are mostly convenient methods because working with dates is a necessary evil of data analysis.

Alternative date range specification

Time series in pandas don't have to be defined by a start and end date. In pandas, it is possible to represent the time of the Series as an interval of dates with a common period between data points. For example, if we want to create a Series just like Y2K, we can do so as follows:

复制代码

Upsampling and downsampling Series

pandas offers the ability to move up and down the granularity of a time series. For example, given a Series of random numbers s for all the days in 2012, calculating the sum for each month is done by the following formula:

复制代码

In the preceding example, the 'M' variable specifies that we're upsampling to month. Downsampling is also done in a similar way; however, pandas provides functionalities for handling the disaggregation in a convenient way.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

Lisrelchen

2015-1-5 03:10:26

Modifying data with functions (Simple)

In this recipe we'll walk through the process of applying a function to a DataFrame. This is a simple but very important part of data analysis. Rarely, if ever, will a data in raw form be sufficient for data analysis. Often, that data needs to be transformed into some other form, and to do that you'll need to apply functions to pandas objects.

Getting ready

Open up your interpreter, and type the following commands successively.

How to do it...

Create a simple Series of simulated open and close for a year.
复制代码
Apply element-wise functions.
复制代码
Define a standalone function that takes two arguments. One is the element itself, and another argument.
复制代码
How it works...

pandas sits on top of NumPy; thus pandas takes advantage of the broadcasting capabilities inherent within NumPy. For example, execute the following script to see the differences in NumPy:

复制代码

Understanding the underlying NumPy structure is beyond the scope, but is extremely helpful in the long run.

There's more...

pandas makes additional use of the apply function in place of the for loop function. Quite often it's necessary to do more complex operations on an entire column(s) of a DataFrame, but broadcasting or looping won't cut it.

Other apply options

There are other apply functions in the family. For example, the applymap function operates in a slightly different manner than the apply function. The applymapfunction operates on a single value and returns a single value, whereas the apply function takes an array-like data structure as an input.

Alternative solutions

Functions can also be applied iteratively; however, this tends to make the functions slow and leads to unnecessarily verbose code.

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Lisrelchen

2015-1-5 03:14:11

Combining datasets (Medium)

Given that we have several different types of DataFrames, how can we best join them into one DataFrame for additional use? We'll also talk about merging and appending them in the following There's more... section.

Getting ready

Open up your interpreter and type the following given commands successively. Very rarely will an analyst receive data in a single flat file. Quite often, data will need to be either appended to the bottom of the DataFrame or attached to the side. For example, if a set of data comes directly from a normalized database, the analyst will need to combine them by joining them using Primary and Foreign Keys.

How to do it...

Create two basic DataFrames df1 and df2.
复制代码
The concat method is similar to the union command in SQL.
复制代码
Merge the two DataFrames into a single DataFrame.
复制代码

How it works...

If the reader is familiar with R's functionalities, then he/she can see that joining data in pandas is not much different than in R. We'll cover more on indexes later, but thinking of the default index as a Primary Key, or the combination of hierarchical index as a Composite Key, elucidates the joining process.

There's more...

There are many options that can be supplied to the merge and join methods to modify the DataFrames' behaviour.

Merge and join details

The merge (and join) method uses a how parameter, which is a string of the join database. The possible values are 'left', 'right', 'outer', and 'inner'.

Specifying outputs in join

The join function (not the previously mentioned one) is easy to use to join DataFrames.

复制代码

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

最佳答案

扫码加我 拉你入群

本帖隐藏的内容

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群