全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 winbugs及其他软件专版
4855 16
2015-01-05
悬赏 1 个论坛币 已解决

Trent Hauck
May 2013
Manipulate, visualize, and analyze your data with pandas
Book Description
Publication Date: May 23, 2013
In Detail
Pandas helps to alleviate a genuinely complex situation in data analytics libraries. Many incumbent languages aren't approachable or are fairly unproductive in general computing tasks in comparison to Python. However with Pandas it's easy to begin working with tabular datasets in a language that's easier to learn and use.
Instant Data Intensive Apps with Pandas How-to starts with Pandas’ functionalities such as joining datasets, cleaning data, and other data munging tasks. It quickly moves onto building a data reporting tool, which consists of analysis in Pandas to determine what’s relevant and present that relevant data in an easy-to-consume manner.
Instant Data Intensive Apps with Pandas How-to starts with data manipulation and other practical tasks for a fundamental understanding, and through successive recipes you will gain a more profitable understanding of Pandas.
Throughout this book the recipes are presented in a structured way. It starts with data transformation techniques, but builds up to more complex examples such as performing statistical analysis and integrating Pandas objects with web applications. The other recipes cover visualization and machine learning, among other things.
Instant Data Intensive Apps with Pandas How-to will get the reader up and running quickly with Pandas and put the user in a position to move up the learning curve faster.
Approach
Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. This book has a practical approach with step-by-step recipes to help readers get to grips with Pandas.
Who this book is for
Users of other data analysis tools will find value in seeing tasks they commonly encounter translated to Pandas and users of Python will encounter an introduction to a very impressive tool in a syntax they inherently know. In terms of general skills, it is assumed that the reader understands basic data structures such as arrays or lists dictionaries or hash map as well as having some understanding of command line work. Installing Pandas is not covered, but the online documentation is straightforward. Also, readers are encouraged to use IPython to interact and experiment with the code

Product Details
  • File Size: 594 KB
  • Print Length: 50 pages
  • Publisher: Packt Publishing (May 23, 2013)
  • Sold by: Amazon Digital Services, Inc.
  • Language: English
  • ASIN: B00CZ6Y0QW
  • Text-to-Speech: Enabled


最佳答案

tigerwolf 查看完整内容

**** 本内容被作者隐藏 ****
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2015-1-5 02:28:40


本帖隐藏的内容

Instant Data Intensive Apps with Pandas How-to (PacktPub 2013) Trent Hauck.rar
大小:(924.45 KB)

只需: 20 个论坛币  马上下载

本附件包括:

  • Instant Data Intensive Apps with Pandas How-to (PacktPub 2013) Trent Hauck.pdf



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-1-5 02:30:53
Working with files (Simple)

In this recipe we'll introduce the pandas DataFrame by doing some quick exercises, then move onto one of the most fundamental parts of data analysis; getting data in and out of files.

Getting ready

Most of the rest of the book is working with data once it's in a pandas data structure, but this recipe is about those structures themselves and getting data in and out of them. Open your interpreter, preferably IPython.


How to do it...
  • Create an incredibly simple DataFrame to start with. A DataFrame can handle lists, NumPy arrays, dicts of strings, and more.
    复制代码

  • The first example is too simple, and isn't useful. Add some column headers and index for more information about the DataFrame.
    复制代码
    复制代码
    复制代码



    复制代码
    复制代码




Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.




How it works...

Most of the file input and output in pandas is the orchestration behind the scenes of formatting the value outputs, and then writing those values to a file. There are many options for formatting file output. The to_csv method takes many parameters. Some of the more common parameters are as follows:

  • sep: It specifies the value to separate with, in the output file
  • index: It is a Boolean that decides whether or not to print the index
  • na_rep: It specifies what to substitute for the na values

The following snippet writes the DataFrame df and writes it to a file called file.tsv, and it's formatted according to the parameters passed to the method.

复制代码


There's more...

In addition to standard file input and output functionalities, pandas has several built-in niceties.

Parsing dates at file read time

Using Panda's sophisticated date parser, a CSV can read and parse dates at the same time, as shown in the following command line:

复制代码

Besides the parsing capabilities, pandas also has a very handy date_range function, which returns a range of dates determined by the inputs. For example, it's very easy to get the months of 2012 in a series. This is shown in the following command line:

复制代码


Accessing data from a public source

pandas can also read CSV data from the Web, assuming http://www.example.com/data.csv is the URL. Take a look at the following example:

复制代码





二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-1-5 02:34:26
Slicing pandas objects (Simple)

In this recipe we'll walk through some basic functionalities about slicing pandas objects. If you're familiar with array slicing, this will be very familiar to you, but with a few idiosyncrasies for pandas.

Getting ready

Open up your interpreter, and execute the following steps:


How to do it...
  • Create a simple DataFrame to explore the different slicing abilities of pandas.
    复制代码


  • Select the first two rows of the column named 'one'.
    复制代码

  • Pass an array of column names instead of 'one'.
    复制代码

  • Use a negative index to navigate backwards through the DataFrame.
    复制代码

  • Select every fifth row from the DataFrame df.
    复制代码

  • Use the head and tail functions to easily select the top and bottom of the DataFrame.
    复制代码




How it works...

At some level, pandas objects behave similar to NumPy arrays; they are after all abstractions built on top of them. However, because we have more metadata about the data structures we can use that to our advantage.

After the initial pandas object is created, simple slicing occurs according to the following structure:

复制代码

Here column names is a string (or an array, if multiple columns) and rows is the number of rows that we wish to use.


There's more...

The methods that have already been described are very useful at a higher level, but there are more granular operations available.

Direct index access

The .ix command is an advanced method for selecting and slicing a DataFrame. Taking the sample from the preceding example, df.ix[1:3 ,[ 'one', 'two']] = 10will not only select the specified subset of the data, but also set its value equal to 10. The .xs command has a more explicit interface for working with indexes.


Resetting the index

Often, the index of the DataFrame becomes out of alignment when slicing data. In pandas, the easiest way to reset an index is with the reset_index() method of the DataFrame object.




二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-1-5 03:03:22
Using indexes to manipulate objects (Medium)

Indexes are not advanced because they're difficult, but if we want to be an expert with pandas it is important that we use them well. We will discuss hierarchical indexes in the following There's more... section.

Getting ready

A good understanding of indexes in pandas is crucial to quickly move the data around. From a business intelligence perspective, they create a distinction similar to that of metrics and dimensions in an OLAP cube. To illustrate this point, this recipe walks through getting stock data out of pandas, combining it, then reindexing it for easy chomping.


How to do it...
  • Use the DataReader object to transfer stock price information into a DataFrame and to explore the basic axis of Panel.
    复制代码

  • Use the axis selectors to easily compute different sets of summary statistics.
    复制代码


  • # major axis is sliceable as well> day_slice = pan.major_axis[1]> pan.major_xs(day_slice)[['gs', 'ba']]
  • Perform the analogous operations as in the preceding examples on the newly created DataFrame.
    复制代码




How it works...

The previous example was certainly contrived, but when indexing and statistical techniques are incorporated, the power of pandas begins to come through. Statistics will be covered in an upcoming recipe.

pandas' indexes by themselves can be thought of as descriptors of a certain point in the DataFrame. When ticker and timestamp are the only indexes in a DataFrame, then the point is individualized by the ticker, timestamp, and column name. After the point is individualized, it's more convenient for aggregation and analysis.


There's more...

Indexes show up all over the place in pandas so it's worthwhile to see some other use cases as well.

Advanced header indexes

Hierarchical indexing isn't limited to rows. Headers can also be represented by MultiIndex, as shown in the following command line:

复制代码


Performing aggregate operations with indexes

As a prelude to the following sections, we'll do a single groupby function here since they work with indexes so well.

复制代码

This answers the question for each ticker and for each day (not date), that is, what was the mean volume over the life of the data.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-1-5 03:03:33
Working with dates (Medium)

In this recipe we'll talk about working with dates in pandas. Because pandas was initially written with financial time series, it has a lot of out of the box date functionalities.

Getting ready

Open up your interpreter and follow the command progression in the following section. Difficult financial analysis was the mother of pandas creation; therefore, it has many efficient and easy ways for dealing with dates.


How to do it...
  • Let's examine the date_range functionality within pandas.
    复制代码

  • Create a time series and slice it by passing a range of dates to Series.
    复制代码




How it works...

The date_range function is defined by dates and frequencies. See the following section for the various frequency designations. The easiest way is to define a start date, end date, and frequency, but there are other ways as well. You can also change the frequency, or resample to a smaller or larger time interval.


There's more...

pandas adds a lot more functionalities to handle dates. These are mostly convenient methods because working with dates is a necessary evil of data analysis.

Alternative date range specification

Time series in pandas don't have to be defined by a start and end date. In pandas, it is possible to represent the time of the Series as an interval of dates with a common period between data points. For example, if we want to create a Series just like Y2K, we can do so as follows:

复制代码
Upsampling and downsampling Series

pandas offers the ability to move up and down the granularity of a time series. For example, given a Series of random numbers s for all the days in 2012, calculating the sum for each month is done by the following formula:

复制代码

In the preceding example, the 'M' variable specifies that we're upsampling to month. Downsampling is also done in a similar way; however, pandas provides functionalities for handling the disaggregation in a convenient way.




二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群