NumPy is the fundamental package for scientific computing with Python. It contains among other things:
1)a powerful N-dimensional array object
2)sophisticated (broadcasting) functions
3)tools for integrating C/C++ and Fortran code
4) useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
NumPy几乎是一个无法回避的科学计算工具包,最常用的也许是它的N维数组对象,其他还包括一些成熟的函数库,用于整合C/C++和Fortran代码的工具包,线性代数、傅里叶变换和随机数生成函数等。NumPy提供了两种基本的对象:ndarray(N-dimensional array object)和 ufunc(universal function object)。ndarray是存储单一数据类型的多维数组,而ufunc则是能够对数组进行处理的函数。
3. scipy: Python Data Analysis Library
SciPy refers to several related but distinct entities:
1)The SciPy Stack, a collection of open source software for scientific computing in Python, and particularly a specified set of core packages.
2)The community of people who use and develop this stack.
3)Several conferences dedicated to scientific computing in Python – SciPy, EuroSciPy and SciPy.in.
4)The SciPy library, one component of the SciPy stack, providing many numerical routines.
matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits.
You didn’t write that awful page. You’re just trying to get some data out of it. Beautiful Soup is here to help. Since 2004, it’s been saving programmers hours or days of work on quick-turnaround screen scraping projects.
爬虫工具
2. pandas: Python Data Analysis Library
Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.
自然语言处理包
第三部分 其他重要包
1. conda
Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.