从 Python 3.4 到 Python 3.9 的提升提升情况的总结

数据洞见

6035

收藏 2021-12-12

对从 Python 3.4 到 Python 3.9 的提升提升情况的总结.png

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

数据洞见

2021-12-12 13:31:26

扩展模块: PyModuleDef 的 m_traverse, m_clear 和 m_free 等函数在模块状态被请求但尚未
被分配时将不会再被调用。这种情况出现在模块被创建之后且模块被执行 (Py_mod_exec 函数) 之
前的时刻。更准确地说，这些函数在 m_size 大于 0 并且模块状态（即 PyModule_GetState()
的返回值）为 NULL 时将不会被调用。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2021-12-12 13:33:06

ipaddress
ipaddress 现在支持 IPv6 作用域地址（即带有 %<scope_id> 前缀的 IPv6 地址）。
IPv6 作用域地址可使用 ipaddress.IPv6Address 来解析。作用域的区 ID 如果存在，可通过
scope_id 属性来获取。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

三重虫

2021-12-13 20:42:15

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2021-12-13 21:10:53

以下异常属于经常被引发的异常。
exception AssertionError
当 assert 语句失败时将被引发。
exception AttributeError
当属性引用 (参见 attribute-references) 或赋值失败时将被引发。（当一个对象根本不支持属性引用或
属性赋值时则将引发TypeError。）
exception EOFError
当input() 函数未读取任何数据即达到文件结束条件 (EOF) 时将被引发。（另外，io.IOBase.
read() 和io.IOBase.readline() 方法在遇到 EOF 则将返回一个空字符串。）
exception FloatingPointError
目前未被使用。
exception GeneratorExit
当一个generator 或coroutine 被关闭时将被引发；参见 generator.close() 和 coroutine.
close()。它直接继承自BaseException 而不是Exception，因为从技术上来说它并不是一
个错误。
exception ImportError
当 import 语句尝试加载模块遇到麻烦时将被引发。并且当 from ... import 中的”from list” 存
在无法找到的名称时也会被引发。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2021-12-13 21:11:06

exception ModuleNotFoundError
ImportError 的子类，当一个模块无法被定位时将由 import 引发。当在sys.modules 中找到
None 时也会被引发。
3.6 新版功能.
exception IndexError
当序列抽取超出范围时将被引发。（切片索引会被静默截短到允许的范围；如果指定索引不是整数
则TypeError 会被引发。）
exception KeyError
当在现有键集合中找不到指定的映射（字典）键时将被引发。
exception KeyboardInterrupt
当用户按下中断键 (通常为 Control-C 或 Delete) 时将被引发。在执行期间，会定期检测中断信
号。该异常继承自BaseException 以确保不会被处理Exception 的代码意外捕获，这样可以避
免退出解释器。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

数据洞见

2021-12-13 21:11:16

exception MemoryError
当一个操作耗尽内存但情况仍可（通过删除一些对象）进行挽救时将被引发。关联的值是一个字符
串，指明是哪种（内部）操作耗尽了内存。请注意由于底层的内存管理架构（C 的 malloc() 函
数），解释器也许并不总是能够从这种情况下完全恢复；但它毕竟可以引发一个异常，这样就能打
印出栈回溯信息，以便找出导致问题的失控程序。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2021-12-13 21:11:34

exception NotImplementedError
此异常派生自RuntimeError。在用户自定义的基类中，抽象方法应当在其要求所派生类重载该
方法，或是在其要求所开发的类提示具体实现尚待添加时引发此异常。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2021-12-13 21:11:45

exception OSError(errno, strerror[, filename[, winerror[, filename2 ]]])
此异常在一个系统函数返回系统相关的错误时将被引发，此类错误包括 I/O 操作失败例如” 文件未
找到” 或” 磁盘已满” 等（不包括非法参数类型或其他偶然性错误）。
构造器的第二种形式可设置如下所述的相应属性。如果未指定这些属性则默认为None。为了能向
下兼容，如果传入了三个参数，则args 属性将仅包含由前两个构造器参数组成的 2 元组。
构造器实际返回的往往是OSError 的某个子类，如下文OS exceptions 中所描述的。具体的子类取决
于最终的errno 值。此行为仅在直接或通过别名来构造OSError 时发生，并且在子类化时不会被
继承。
errno
来自于 C 变量 errno 的数字错误码。
winerror
在 Windows 下，此参数将给出原生的 Windows 错误码。而errno 属性将是该原生错误码在
POSIX 平台下的近似转换形式。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

1jian.fun

2021-12-14 13:04:06

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

玩于股涨之中

2022-1-2 15:35:33

seaborn: statistical data visualization

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

玩于股涨之中

2022-1-2 15:35:52

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

For a brief introduction to the ideas behind the library, you can read the introductory notes or the paper. Visit the installation page to see how you can download the package and get started with it. You can browse the example gallery to see some of the things that you can do with seaborn, and then check out the tutorial or API reference to find out how.

To see the code or report a bug, please visit the GitHub repository. General support questions are most at home on stackoverflow or discourse, which have dedicated channels for seaborn.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:45:07

Building structured multi-plot grids
When exploring multi-dimensional data, a useful approach is to draw multiple instances of the same plot on different subsets of your dataset. This technique is sometimes called either “lattice” or “trellis” plotting, and it is related to the idea of “small multiples”. It allows a viewer to quickly extract a large amount of information about a complex dataset. Matplotlib offers good support for making figures with multiple axes; seaborn builds on top of this to directly link the structure of the plot to the structure of your dataset.

The figure-level functions are built on top of the objects discussed in this chapter of the tutorial. In most cases, you will want to work with those functions. They take care of some important bookkeeping that synchronizes the multiple plots in each grid. This chapter explains how the underlying objects work, which may be useful for advanced applications.

Conditional small multiples
The FacetGrid class is useful when you want to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of your dataset. A FacetGrid can be drawn with up to three dimensions: row, col, and hue. The first two have obvious correspondence with the resulting array of axes; think of the hue variable as a third dimension along a depth axis, where different levels are plotted with different colors.

Each of relplot(), displot(), catplot(), and lmplot() use this object internally, and they return the object when they are finished so that it can be used for further tweaking.

The class is used by initializing a FacetGrid object with a dataframe and the names of the variables that will form the row, column, or hue dimensions of the grid. These variables should be categorical or discrete, and then the data at each level of the variable will be used for a facet along that axis. For example, say we wanted to examine differences between lunch and dinner in the tips dataset:

tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time")

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:45:35

Visualizing distributions of data
An early step in any effort to analyze or model data should be to understand how the variables are distributed. Techniques for distribution visualization can provide quick answers to many important questions. What range do the observations cover? What is their central tendency? Are they heavily skewed in one direction? Is there evidence for bimodality? Are there significant outliers? Do the answers to these questions vary across subsets defined by other variables?

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:46:09

penguins = sns.load_dataset("penguins")
sns.displot(penguins, x="flipper_length_mm")
../_images/distributions_3_0.png
This plot immediately affords a few insights about the flipper_length_mm variable. For instance, we can see that the most common flipper length is about 195 mm, but the distribution appears bimodal, so this one number does not represent the data well.

Choosing the bin size
The size of the bins is an important parameter, and using the wrong bin size can mislead by obscuring important features of the data or by creating apparent features out of random variability. By default, displot()/histplot() choose a default bin size based on the variance of the data and the number of observations. But you should not be over-reliant on such automatic approaches, because they depend on particular assumptions about the structure of your data. It is always advisable to check that your impressions of the distribution are consistent across different bin sizes. To choose the size directly, set the binwidth parameter:

sns.displot(penguins, x="flipper_length_mm", binwidth=3)
../_images/distributions_5_0.png
In other circumstances, it may make more sense to specify the number of bins, rather than their size:

sns.displot(penguins, x="flipper_length_mm", bins=20)
../_images/distributions_7_0.png
One example of a situation where defaults fail is when the variable takes a relatively small number of integer values. In that case, the default bin width may be too small, creating awkward gaps in the distribution:

tips = sns.load_dataset("tips")
sns.displot(tips, x="size")
../_images/distributions_9_0.png
One approach would be to specify the precise bin breaks by passing an array to bins:

sns.displot(tips, x="size", bins=[1, 2, 3, 4, 5, 6, 7])
../_images/distributions_11_0.png
This can also be accomplished by setting discrete=True, which chooses bin breaks that represent the unique values in a dataset with bars that are centered on their corresponding value.

sns.displot(tips, x="size", discrete=True)
../_images/distributions_13_0.png
It’s also possible to visualize the distribution of a categorical variable using the logic of a histogram. Discrete bins are automatically set for categorical variables, but it may also be helpful to “shrink” the bars slightly to emphasize the categorical nature of the axis:

sns.displot(tips, x="day", shrink=.8)
../_images/distributions_15_0.png
Conditioning on other variables
Once you understand the distribution of a variable, the next step is often to ask whether features of that distribution differ across other variables in the dataset. For example, what accounts for the bimodal distribution of flipper lengths that we saw above? displot() and histplot() provide support for conditional subsetting via the hue semantic. Assigning a variable to hue will draw a separate histogram for each of its unique values and distinguish them by color:

sns.displot(penguins, x="flipper_length_mm", hue="species")
../_images/distributions_17_0.png
By default, the different histograms are “layered” on top of each other and, in some cases, they may be difficult to distinguish. One option is to change the visual representation of the histogram from a bar plot to a “step” plot:

sns.displot(penguins, x="flipper_length_mm", hue="species", element="step")
../_images/distributions_19_0.png
Alternatively, instead of layering each bar, they can be “stacked”, or moved vertically. In this plot, the outline of the full histogram will match the plot with only a single variable:

sns.displot(penguins, x="flipper_length_mm", hue="species", multiple="stack")
../_images/distributions_21_0.png
The stacked histogram emphasizes the part-whole relationship between the variables, but it can obscure other features (for example, it is difficult to determine the mode of the Adelie distribution. Another option is “dodge” the bars, which moves them horizontally and reduces their width. This ensures that there are no overlaps and that the bars remain comparable in terms of height. But it only works well when the categorical variable has a small number of levels:

sns.displot(penguins, x="flipper_length_mm", hue="sex", multiple="dodge")
../_images/distributions_23_0.png
Because displot() is a figure-level function and is drawn onto a FacetGrid, it is also possible to draw each individual distribution in a separate subplot by assigning the second variable to col or row rather than (or in addition to) hue. This represents the distribution of each subset well, but it makes it more difficult to draw direct comparisons:

sns.displot(penguins, x="flipper_length_mm", col="sex")

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:46:40

Normalized histogram statistics
Before we do, another point to note is that, when the subsets have unequal numbers of observations, comparing their distributions in terms of counts may not be ideal. One solution is to normalize the counts using the stat parameter:

sns.displot(penguins, x="flipper_length_mm", hue="species", stat="density")
../_images/distributions_27_0.png
By default, however, the normalization is applied to the entire distribution, so this simply rescales the height of the bars. By setting common_norm=False, each subset will be normalized independently:

sns.displot(penguins, x="flipper_length_mm", hue="species", stat="density", common_norm=False)
../_images/distributions_29_0.png
Density normalization scales the bars so that their areas sum to 1. As a result, the density axis is not directly interpretable. Another option is to normalize the bars to that their heights sum to 1. This makes most sense when the variable is discrete, but it is an option for all histograms:

sns.displot(penguins, x="flipper_length_mm", hue="species", stat="probability")
../_images/distributions_31_0.png
Kernel density estimation
A histogram aims to approximate the underlying probability density function that generated the data by binning and counting observations. Kernel density estimation (KDE) presents a different solution to the same problem. Rather than using discrete bins, a KDE plot smooths the observations with a Gaussian kernel, producing a continuous density estimate:

sns.displot(penguins, x="flipper_length_mm", kind="kde")
../_images/distributions_33_0.png
Choosing the smoothing bandwidth
Much like with the bin size in the histogram, the ability of the KDE to accurately represent the data depends on the choice of smoothing bandwidth. An over-smoothed estimate might erase meaningful features, but an under-smoothed estimate can obscure the true shape within random noise. The easiest way to check the robustness of the estimate is to adjust the default bandwidth:

sns.displot(penguins, x="flipper_length_mm", kind="kde", bw_adjust=.25)
../_images/distributions_35_0.png
Note how the narrow bandwidth makes the bimodality much more apparent, but the curve is much less smooth. In contrast, a larger bandwidth obscures the bimodality almost completely:

sns.displot(penguins, x="flipper_length_mm", kind="kde", bw_adjust=2)
../_images/distributions_37_0.png
Conditioning on other variables
As with histograms, if you assign a hue variable, a separate density estimate will be computed for each level of that variable:

sns.displot(penguins, x="flipper_length_mm", hue="species", kind="kde")
../_images/distributions_39_0.png
In many cases, the layered KDE is easier to interpret than the layered histogram, so it is often a good choice for the task of comparison. Many of the same options for resolving multiple distributions apply to the KDE as well, however:

sns.displot(penguins, x="flipper_length_mm", hue="species", kind="kde", multiple="stack")
../_images/distributions_41_0.png
Note how the stacked plot filled in the area between each curve by default. It is also possible to fill in the curves for single or layered densities, although the default alpha value (opacity) will be different, so that the individual densities are easier to resolve.

sns.displot(penguins, x="flipper_length_mm", hue="species", kind="kde", fill=True)
../_images/distributions_43_0.png
Kernel density estimation pitfalls
KDE plots have many advantages. Important features of the data are easy to discern (central tendency, bimodality, skew), and they afford easy comparisons between subsets. But there are also situations where KDE poorly represents the underlying data. This is because the logic of KDE assumes that the underlying distribution is smooth and unbounded. One way this assumption can fail is when a varible reflects a quantity that is naturally bounded. If there are observations lying close to the bound (for example, small values of a variable that cannot be negative), the KDE curve may extend to unrealistic values:

sns.displot(tips, x="total_bill", kind="kde")
../_images/distributions_45_0.png
This can be partially avoided with the cut parameter, which specifies how far the curve should extend beyond the extreme datapoints. But this influences only where the curve is drawn; the density estimate will still smooth over the range where no data can exist, causing it to be artifically low at the extremes of the distribution:

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:47:11

sns.displot(tips, x="total_bill", kind="kde", cut=0)
../_images/distributions_47_0.png
The KDE approach also fails for discrete data or when data are naturally continuous but specific values are over-represented. The important thing to keep in mind is that the KDE will always show you a smooth curve, even when the data themselves are not smooth. For example, consider this distribution of diamond weights:

diamonds = sns.load_dataset("diamonds")
sns.displot(diamonds, x="carat", kind="kde")
../_images/distributions_49_0.png
While the KDE suggests that there are peaks around specific values, the histogram reveals a much more jagged distribution:

sns.displot(diamonds, x="carat")
../_images/distributions_51_0.png
As a compromise, it is possible to combine these two approaches. While in histogram mode, displot() (as with histplot()) has the option of including the smoothed KDE curve (note kde=True, not kind="kde"):

sns.displot(diamonds, x="carat", kde=True)
../_images/distributions_53_0.png
Empirical cumulative distributions
A third option for visualizing distributions computes the “empirical cumulative distribution function” (ECDF). This plot draws a monotonically-increasing curve through each datapoint such that the height of the curve reflects the proportion of observations with a smaller value:

sns.displot(penguins, x="flipper_length_mm", kind="ecdf")
../_images/distributions_55_0.png
The ECDF plot has two key advantages. Unlike the histogram or KDE, it directly represents each datapoint. That means there is no bin size or smoothing parameter to consider. Additionally, because the curve is monotonically increasing, it is well-suited for comparing multiple distributions:

sns.displot(penguins, x="flipper_length_mm", hue="species", kind="ecdf")
../_images/distributions_57_0.png
The major downside to the ECDF plot is that it represents the shape of the distribution less intuitively than a histogram or density curve. Consider how the bimodality of flipper lengths is immediately apparent in the histogram, but to see it in the ECDF plot, you must look for varying slopes. Nevertheless, with practice, you can learn to answer all of the important questions about a distribution by examining the ECDF, and doing so can be a powerful approach.

Visualizing bivariate distributions
All of the examples so far have considered univariate distributions: distributions of a single variable, perhaps conditional on a second variable assigned to hue. Assigning a second variable to y, however, will plot a bivariate distribution:

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm")
../_images/distributions_60_0.png
A bivariate histogram bins the data within rectangles that tile the plot and then shows the count of observations within each rectangle with the fill color (analagous to a heatmap()). Similarly, a bivariate KDE plot smoothes the (x, y) observations with a 2D Gaussian. The default representation then shows the contours of the 2D density:

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", kind="kde")
../_images/distributions_62_0.png
Assigning a hue variable will plot multiple heatmaps or contour sets using different colors. For bivariate histograms, this will only work well if there is minimal overlap between the conditional distributions:

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")
../_images/distributions_64_0.png
The contour approach of the bivariate KDE plot lends itself better to evaluating overlap, although a plot with too many contours can get busy:

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", hue="species", kind="kde")
../_images/distributions_66_0.png
Just as with univariate plots, the choice of bin size or smoothing bandwidth will determine how well the plot represents the underlying bivariate distribution. The same parameters apply, but they can be tuned for each variable by passing a pair of values:

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", binwidth=(2, .5))
../_images/distributions_68_0.png
To aid interpretation of the heatmap, add a colorbar to show the mapping between counts and color intensity:

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", binwidth=(2, .5), cbar=True)
../_images/distributions_70_0.png
The meaning of the bivariate density contours is less straightforward. Because the density is not directly interpretable, the contours are drawn at iso-proportions of the density, meaning that each curve shows a level set such that some proportion p of the density lies below it. The p values are evenly spaced, with the lowest level contolled by the thresh parameter and the number controlled by levels:

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", kind="kde", thresh=.2, levels=4)
../_images/distributions_72_0.png
The levels parameter also accepts a list of values, for more control:

sns.displot(penguins, x="bill_length_mm", y="bill_depth_mm", kind="kde", levels=[.01, .05, .1, .8])

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:47:59

Plotting joint and marginal distributions
The first is jointplot(), which augments a bivariate relatonal or distribution plot with the marginal distributions of the two variables. By default, jointplot() represents the bivariate distribution using scatterplot() and the marginal distributions using histplot():

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
../_images/distributions_80_0.png
Similar to displot(), setting a different kind="kde" in jointplot() will change both the joint and marginal plots the use kdeplot():

sns.jointplot(
data=penguins,
x="bill_length_mm", y="bill_depth_mm", hue="species",
kind="kde"
)
../_images/distributions_82_0.png
jointplot() is a convenient interface to the JointGrid class, which offeres more flexibility when used directly:

g = sns.JointGrid(data=penguins, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.histplot)
g.plot_marginals(sns.boxplot)
../_images/distributions_84_0.png
A less-obtrusive way to show marginal distributions uses a “rug” plot, which adds a small tick on the edge of the plot to represent each individual observation. This is built into displot():

sns.displot(
penguins, x="bill_length_mm", y="bill_depth_mm",
kind="kde", rug=True
)
../_images/distributions_86_0.png
And the axes-level rugplot() function can be used to add rugs on the side of any other kind of plot:

sns.relplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
sns.rugplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
../_images/distributions_88_0.png
Plotting many distributions
The pairplot() function offers a similar blend of joint and marginal distributions. Rather than focusing on a single relationship, however, pairplot() uses a “small-multiple” approach to visualize the univariate distribution of all variables in a dataset along with all of their pairwise relationships:

sns.pairplot(penguins)
../_images/distributions_90_0.png
As with jointplot()/JointGrid, using the underlying PairGrid directly will afford more flexibility with only a bit more typing:

g = sns.PairGrid(penguins)
g.map_upper(sns.histplot)
g.map_lower(sns.kdeplot, fill=True)
g.map_diag(sns.histplot, kde=True)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:48:19

Visualizing regression models
Many datasets contain multiple quantitative variables, and the goal of an analysis is often to relate those variables to each other. We previously discussed functions that can accomplish this by showing the joint distribution of two variables. It can be very helpful, though, to use statistical models to estimate a simple relationship between two noisy sets of observations. The functions discussed in this chapter will do so through the common framework of linear regression.

In the spirit of Tukey, the regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. That is to say that seaborn is not itself a package for statistical analysis. To obtain quantitative measures related to the fit of regression models, you should use statsmodels. The goal of seaborn, however, is to make exploring a dataset through visualization quick and easy, as doing so is just as (if not more) important than exploring a dataset through tables of statistics.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:48:52

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(color_codes=True)
tips = sns.load_dataset("tips")
Functions to draw linear regression models
Two main functions in seaborn are used to visualize a linear relationship as determined through regression. These functions, regplot() and lmplot() are closely related, and share much of their core functionality. It is important to understand the ways they differ, however, so that you can quickly choose the correct tool for particular job.

In the simplest invocation, both functions draw a scatterplot of two variables, x and y, and then fit the regression model y ~ x and plot the resulting regression line and a 95% confidence interval for that regression:

sns.regplot(x="total_bill", y="tip", data=tips);

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:49:11

Similar functions for similar tasks
The seaborn namespace is flat; all of the functionality is accessible at the top level. But the code itself is hierarchically structured, with modules of functions that achieve similar visualization goals through different means. Most of the docs are structured around these modules: you’ll encounter names like “relational”, “distributional”, and “categorical”.

For example, the distributions module defines functions that specialize in representing the distribution of datapoints. This includes familiar methods like the histogram:

penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", hue="species", multiple="stack")
../_images/function_overview_3_0.png
Along with similar, but perhaps less familiar, options such as kernel density estimation:

sns.kdeplot(data=penguins, x="flipper_length_mm", hue="species", multiple="stack")
../_images/function_overview_5_0.png
Functions within a module share a lot of underlying code and offer similar features that may not be present in other components of the library (such as multiple="stack" in the examples above). They are designed to facilitate switching between different visual representations as you explore a dataset, because different representations often have complementary strengths and weaknesses.

Figure-level vs. axes-level functions
In addition to the different modules, there is a cross-cutting classification of seaborn functions as “axes-level” or “figure-level”. The examples above are axes-level functions. They plot data onto a single matplotlib.pyplot.Axes object, which is the return value of the function.

In contrast, figure-level functions interface with matplotlib through a seaborn object, usually a FacetGrid, that manages the figure. Each module has a single figure-level function, which offers a unitary interface to its various axes-level functions. The organization looks a bit like this:

../_images/function_overview_8_0.png
For example, displot() is the figure-level function for the distributions module. Its default behavior is to draw a histogram, using the same code as histplot() behind the scenes:

sns.displot(data=penguins, x="flipper_length_mm", hue="species", multiple="stack")

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:49:42

可视化统计关系

统计分析是了解数据集中的变量如何相互关联以及这些关系如何依赖于其他变量的过程。可视化是此过程的核心组件，这是因为当数据被恰当地可视化时，人的视觉系统可以看到指示关系的趋势和模式。

我们将在本教程中讨论三个 seaborn 函数。我们最常用的是relplot()。这是一个figure-level的函数，可以用散点图和线图两种通用的方法来可视化统计关系。relplot()将FacetGrid 与两个axes-level函数组合在一起:

scatterplot() (kind="scatter"; 默认值)
lineplot()(kind="line")
正如我们将要看到的，这些函数可能非常有启发性，因为他们使用简单且易于理解的数据表示形式，且仍然能够表示复杂的数据集结构。之所以可以这样，是因为它们可以通过色调、大小和样式的语义映射最多三个额外的变量来增强绘制的二维图形。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="darkgrid")
复制ErrorOK!
用散点图关联变量
散点图是数据可视化的支柱，它通过点云描绘了两个变量的联合分布，其中每个点代表数据集中的一个观测值。这种描述能够使我们通过视觉推断出许多信息，他们之间是否存在任何有意义的关系。

在 seaborn 中有多种方式绘制散点图。当两个变量的是数值型时，最基本的是函数scatterplot()。在类别可视化，我们将会看到使用散点图来显示类别数据的专用工具。scatterplot()是relplot()中kind的默认类型(也可以通过kind="scatter"来设置):

tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", data=tips);

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:50:13

线性关系可视化

许多数据集包含多定量变量，并且分析的目的通常是将这些变量联系起来。我们之前讨论可以通过显示两个变量相关性的来实现此目的的函数。但是，使用统计模型来估计两组噪声观察量之间的简单关系可能会非常有效。本章讨论的函数将通过线性回归的通用框架实现。

本着图凯(Tukey)精神，seaborn 中的回归图主要用于添加视觉指南，以助于在探索性数据分析中强调存在于数据集的模式。换而言之，seaborn 本身不是为统计分析而生。要获得与回归模型拟合相关定量度量，你应当使用 statsmodels. 然而，seaborn 的目标是通过可视化快速简便地 3 探索数据集，因为这样做，如果说不上更，是与通过统计表探索数据集一样重要。

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
复制ErrorOK!
sns.set(color_codes=True)
复制ErrorOK!
tips = sns.load_dataset("tips")
复制ErrorOK!
绘制线性回归模型的函数
seaborn 中两个主要函数主要用于显示回归确定的线性关系。这些函数，regplot() 和 lmplot()，之间密切关联，并且共享核心功能。但是，了解它们的不同之处非常重要，这样你就可以快速为特定工作选择正确的工具。

在最简单的调用中，两个函数都绘制了两个变量，x和y，然后拟合回归模型y~x并绘制得到回归线和该回归的 95%置信区间：

sns.regplot(x="total_bill", y="tip", data=tips)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:50:38

拟合不同模型
上面使用的简单线性回归模型非常容易拟合，但是它不适合某些类型的数据集。Anscombe 的四重奏数据集展示了一些实例，其中简单线性回归提供了相同的关系估计，然而简单的视觉检查清楚地显示了差异。例如，在第一种情况下，线性回归是一个很好的模型：

anscombe = sns.load_dataset("anscombe")
复制ErrorOK!
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),
ci=None, scatter_kws={"s": 80});

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:51:49

构建结构化多图网格

在探索中等维度数据时，经常需要在数据集的不同子集上绘制同一类型图的多个实例。这种技术有时被称为“网格”或“格子”绘图，它与“多重小图”的概念有关。这种技术使查看者从复杂数据中快速提取大量信息。 Matplotlib 为绘制这种多轴图提供了很好的支持; seaborn 构建于此之上，可直接将绘图结构和数据集结构关系起来。

要使用网格图功能，数据必须在 Pandas 数据框中，并且必须采用 Hadley Whickam 所谓的 “整洁”数据的形式。简言之，用来画图的数据框应该构造成每列一个变量，每一行一个观察的形式。

至于高级用法，可以直接使用本教程中讨论的对象，以提供最大的灵活性。一些 seaborn 函数（例如lmplot()，catplot()和pairplot()）也在后台使用它们。与其他在没有操纵图形的情况下绘制到特定的（可能已经存在的）matplotlib Axes上的“Axes-level” seaborn 函数不同，这些更高级别的函数在调用时会创建一个图形，并且通常对图形的设置方式更加严格。在某些情况下，这些函数或它们所依赖的类构造函数的参数将提供不同的接口属性，如lmplot()中的图形大小，你可以设置每个子图的高和宽高比。但是，使用这些对象的函数在绘图后都会返回它，并且这些对象大多都有方便简单的方法来改变图的绘制方式。

import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="ticks")

基于一定条件的多重小图
当你想在数据集的不同子集中分别可视化变量分布或多个变量之间的关系时，FacetGrid类非常有用。 FacetGrid最多有三个维：row，col和hue。前两个与轴(axes)阵列有明显的对应关系;将色调变量hue视为沿深度轴的第三个维度，不同的级别用不同的颜色绘制。

首先，使用数据框初始化FacetGrid对象并指定将形成网格的行，列或色调维度的变量名称。这些变量应是离散的，然后对应于变量的不同取值的数据将用于沿该轴的不同小平面的绘制。例如，假设我们想要在tips数据集中检查午餐和晚餐小费分布的差异。

此外，relplot()，catplot()和lmplot()都在内部使用此对象，并且它们在完成时返回该对象，以便进一步调整。

tips = sns.load_dataset("tips")

g = sns.FacetGrid(tips, col="time")

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

数据洞见

2022-1-16 08:52:25

控制图像的美学样式(aesthetics)

绘制有吸引力的图像很十分重要的。当你在探索一个数据集并为你自己做图的时候，制作一些让人看了心情愉悦的图像是很好的。可视化对向观众传达量化的简介也是很重要的，在这种情况下制作能够抓住查看者的注意力并牢牢吸引住他们的图像就更有必要了。

Matplotlib 是高度可定制的，但是很难知道要如何设置图像才能使得图像更加吸引人。Seaborn 提供了许多定制好的主题和高级的接口，用于控制 Matplotlib 所做图像的外观。

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
复制ErrorOK!
让我们定义一个简单的函数来绘制一些偏移正弦波，这将帮助我们看到我们可以调整的能够影响图像风格的不同参数。

def sinplot(flip=1):
x = np.linspace(0, 14, 100)
for i in range(1, 7):
plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip)
复制ErrorOK!
这是 Matplotlib 默认情况下的绘图外观：

sinplot()

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

宽客老丁

2022-1-16 08:53:30

（注意，在 0.8 之前的 seaborn 版本中， set() 已经在使用 impory 语句导入的时候就被调用了。但在以后的版本中，必须要显式调用它）。

Seaborn 将 matplotlib 参数分成两个独立的组。第一组设置了图像的美术风格，第二组则对图像中不同的元素进行了控制，使得图像可以很容易地融入不同的环境中。

操作这些参数的接口是两对函数。要控制样式，请使用 axes_style() 和 set_style() 函数。要对图像中元素的样式进行修改，请使用 plotting_context() 和 set_context() 函数。在这两种情况下（控制图像样式与修改元素样式），第一个函数会返回一个参数字典，第二个函数设置 matplotlib 中相关参数的默认值。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

宽客老丁

2022-1-16 08:53:52

可视化数据集的分布

在处理一组数据时，您通常想做的第一件事就是了解变量的分布情况。本教程的这一章将简要介绍 seaborn 中用于检查单变量和双变量分布的一些工具。您可能还需要查看[categorical.html]（categorical.html #categical-tutorial）章节中的函数示例，这些函数可以轻松地比较变量在其他变量级别上的分布。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

宽客老丁

2022-1-16 08:54:38

3.6 新版功能.
exception IndexError
当序列抽取超出范围时将被引发。（切片索引会被静默截短到允许的范围；如果指定索引不是整数
则TypeError 会被引发。）
exception KeyError
当在现有键集合中找不到指定的映射（字典）键时将被引发。
exception KeyboardInterrupt
当用户按下中断键 (通常为 Control-C 或 Delete) 时将被引发。在执行期间，会定期检测中断信
号。该异常继承自BaseException 以确保不会被处理Exception 的代码意外捕获，这样可以避
免退出解释器。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

宽客老丁

2022-1-16 08:55:08

台湾大牛导师李御玺，数十个商业案例实操，带您玩转数据挖掘

学数据分析,实现高薪就业捷径
你想学习数据分析吗？

strerror
操作系统所提供的相应错误信息。它在 POSIX 平台中由 C 函数 perror() 来格式化，在
Windows 中则是由 FormatMessage()。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群