Text Mining: Ukraine Tweet Network Analysis in R

1970

收藏 2015-01-22

Text Mining: Ukraine Tweet Network Analysis in R

#Ukraine Tweets as a Network
There were certain key terms in the tweets that connected the #Ukraine tweets together. Removing them would improve our ability to see underlying connections (besides the obvious), and simplify the network graph. So here I chose to remove "ukraine", "prorussian", and "russia".

You might remember last time to create an adjacency matrix for the terms, we multiplied the term-document matrix and its transpose together. Here we will perform the same matrix multiplication but in a different order, to create an adjacency matrix for the tweets (documents). This time we require the transpose of the tweet matrix multiplied by the tweet matrix, so that the tweets (docs) are multiplied together.

复制代码

We see from the tweet adjacency matrix, the terms two documents have in common. For example, tweet 9 has 1 term in common with tweet 6. The number will be the same whether you start at tweet 9 or tweet 6, and compare the other.

Now we are ready for plotting the network graphic.

Visualizing the Network
Again we will use the igraph library in R, and use the graph.adjacency() function to create the network graph object. Recall that V( ) allows us to manipulate the vertices and E() allows us to format the edges. Below we change and set the labels, color, and size for the vertices.

复制代码

Barplot of Number of Connections

From the barplot, we see that there are over 60 tweets which do not share any edges with other tweets. For the most connections, there is 1 tweet with 59 connections. The median connection number is 16.

Next we modify the the graph object even more by accenting the vertices with zero degrees selected by index in the idx variable.. In order to understand the content of those isolated tweets, we pull the first 20 characters of tweet text from the raw tweet data (you can specify how many you want).

Then we change the color and width of the edges to reflect a scale of the minimum and maximum weights (width/strength of the connections). This way we can discern the size of the weight relative to the maximum weight. Then we plot the tweet network graphic.

复制代码

Initial Tweet Network Graphic

The first 20 characters of tweets with no degrees in blue surround the network of interconnected tweets. Looking at this cumbersome graphic, I would like to eliminate the zero degree tweets so we can look at the connected tweets.

复制代码

Tweet Network Graphic- Removed Unconnected Vertices

Now with the degree-less tweets removed, we can get a better view of the tweet network. Additionally, we can delete the edges with low weights to accentuate the connections with heavier weights.

Revised Again Plotting Code:

复制代码

Tweet Network Graphic- Removed Low Degree Tweets

The new tweet network graphic is much more manageable than the first two graphics, which included the zero degree tweets, and edges with low weight. We can observe a few close tweet clusters- at least six.

Tweet Clusters
Since we now have our visual of tweets, and see how they cluster together with various weights, we would like to read the tweets. For example, let us explore the cluster in the very top right of the graphic, consisting of text numbers 105, 177, 145, 152, 68, 89, 88, 55, 104, 174, and 196.

Code:

复制代码

[1] "@ericmargolis Is Russia or the US respecting the sovereignty and territorial integrity of #Ukraine as per the 1994 Budapest Memorandum????"
[2] "Troops on the Ground: U.S. and NATO Plan PSYOPS Teams in #Ukraine - http://t.co/pXP3TR0uwi #LNYHBT #TEAPARTY #WAAR #REDNATION #CCOT #TCOT"
[3] "US condemns a

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群