两份twitter社交网络数据集!
1、数据来源:http://snap.stanford.edu/data/egonets-Twitter.html(Social circles:Twitter);http://snap.stanford.edu/data/higgs-twitter.html(Higgs Twitter Dataset)
2、时间跨度:twitter
3、区域范围:全国
4、指标说明:
(1)Social circles: Twitter
该数据集由Twitter的“圆圈”(或“列表”)组成。Twitter数据来自公共资源。数据集包括节点要素(轮廓),圆和自我网络。
| Dataset statistics |
| Nodes | 81306 |
| Edges | 1768149 |
| Nodes in largest WCC | 81306 (1.000) |
| Edges in largest WCC | 1768149 (1.000) |
| Nodes in largest SCC | 68413 (0.841) |
| Edges in largest SCC | 1685163 (0.953) |
| Average clustering coefficient | 0.5653 |
| Number of triangles | 13082506 |
| Fraction of closed triangles | 0.06415 |
| Diameter (longest shortest path) | 7 |
| 90-percentile effective diameter | 4.5 |
Citation:
J.McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks.NIPS, 2012.
(2) Higgs Twitter Dataset
希格斯(Higgs)数据集的建立是在2012年7月4日宣布发现具有希格斯玻色子玻色子特征的新粒子之前,之中和之后监视Twitter上的传播过程而建立的。和2012年7月7日。
此处提供的四个定向网络已从Twitter中的用户活动中提取为:
- 转推(转推网络)
- 回复(回复网络)现有推文
- 提及(提及网络)其他用户
- 参与上述活动的用户之间的朋友/追随者社交关系
- 关于希格斯玻色子发现期间Twitter活动的信息
值得一提的是,用户ID已被匿名化,并且所有网络都使用相同的用户ID。这种选择允许将Higgs数据集用于有关大规模相互依存/互连的多路复用/多层网络的研究,其中一层负责社会结构,三层负责编码不同类型的用户动态。
此数据集最终更新于2015年3月31日更新。
| Social Network statistics |
| Nodes | 456626 |
| Edges | 14855842 |
| Nodes in largest WCC | 456290 (0.999) |
| Edges in largest WCC | 14855466 (1.000) |
| Nodes in largest SCC | 360210 (0.789) |
| Edges in largest SCC | 14102605 (0.949) |
| Average clustering coefficient | 0.1887 |
| Number of triangles | 83023401 |
| Fraction of closed triangles | 0.002901 |
| Diameter (longest shortest path) | 9 |
| 90-percentile effective diameter | 3.7 |
| Retweet Network statistics |
| Nodes | 256491 |
| Edges | 328132 |
| Nodes in largest WCC | 223833 (0.873) |
| Edges in largest WCC | 308596 (0.940) |
| Nodes in largest SCC | 984 (0.004) |
| Edges in largest SCC | 3850 (0.012) |
| Average clustering coefficient | 0.0156 |
| Number of triangles | 21172 |
| Fraction of closed triangles | 0.0001085 |
| Diameter (longest shortest path) | 19 |
| 90-percentile effective diameter | 6.8 |
| Reply Network statistics |
| Nodes | 38918 |
| Edges | 32523 |
| Nodes in largest WCC | 12839 (0.330) |
| Edges in largest WCC | 14944 (0.459) |
| Nodes in largest SCC | 322 (0.008) |
| Edges in largest SCC | 708 (0.022) |
| Average clustering coefficient | 0.0058 |
| Number of triangles | 244 |
| Fraction of closed triangles | 0.0001561 |
| Diameter (longest shortest path) | 29 |
| 90-percentile effective diameter | 10 |
| Mention Network statistics |
| Nodes | 116408 |
| Edges | 150818 |
| Nodes in largest WCC | 91606 (0.787) |
| Edges in largest WCC | 132068 (0.876) |
| Nodes in largest SCC | 1801 (0.015) |
| Edges in largest SCC | 7069 (0.047) |
| Average clustering coefficient | 0.0825 |
| Number of triangles | 23068 |
| Fraction of closed triangles | 0.0002417 |
| Diameter (longest shortest path) | 18 |
| 90-percentile effective diameter | 6.5 |
Citation:
M.De Domenico, A. Lima, P. Mougel and M. Musolesi. The Anatomy of a Scientific Rumor. (Nature Open Access)Scientific Reports 3, 2980 (2013).