Grouping DataFrame withIndex levels and columns¶A DataFrame may be grouped by a combination of columns and indexlevels by specifying the column names as strings and the index levels as pd.Grouper objects.
用index levels和列来对DF进行分组
一个DF可以被分组通过列和索引水平之间的组合来进行分组,以列的名称作为列名,用index level作为grouper的对象。
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
....: ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
....:
In [47]: index =pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
In [48]: df =pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
....: 'B': np.arange(8)},
....: index=index)
....:
In [49]: df
Out[49]:
A B
first second
bar one 1 0
two 1 1
baz one 1 2
two 1 3
foo one 2 4
two 2 5
qux one 3 6
two 3 7
生成DF
The following example groups dfby the second indexlevel and the A column.
下面的例子针对DF用第二个index level即second和A列进行分组
df.groupby([pd.Grouper(level=1),'A']).sum()
Out[50]:
B
second A
one 1 2
2 4
3 6
two 1 4
2 5
3 7
Index levels may also be specified by name.
这里的level=1可以用second代替,也就是用name代替。
In [51]: df.groupby([pd.Grouper(level='second'), 'A']).sum()
Out[51]:
B
second A
one 1 2
2 4
3 6
two 1 4
2 5
3 7
Index level names may be specified as keys directly to groupby.
也可以使用关键词的方法
In [52]: df.groupby(['second', 'A']).sum()
Out[52]:
B
second A
one 1 2
2 4
3 6
two 1 4
2 5
3 7