全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 winbugs及其他软件专版
1041 2
2017-08-22
Apache Storm Tweets Sentiment Analysis

本帖隐藏的内容

https://github.com/P7h/StormTweetsSentimentD3UKViz



You might also be interested in checking out my other project, Twitter sentiment of States of US on a D3.js Choropleth Map on StormTweetsSentimentD3Viz.Introduction

This repository contains an application which is built to demonstrate as an example of Storm distributed framework by performing sentiment analysis of tweets originating from U.K. in real-time. This Topology retrieves tweets originating from UK and computes and visualizes the sentiment scores of each of the county / region of United States [based on tweets] in a Choropleth Map using D3.js continuously for 10 minutes [in local mode]. User can also explicitly kill the topology by pressing Ctrl+C for exiting the application.

Apache Storm is an open source distributed real-time computation system, developed at BackType by Nathan Marz and team. It has been open sourced by Twitter [post BackType acquisition] in August, 2011. And became a top level project in Apache on 29th September, 2014.
This application has been developed and tested with Storm v0.8.2 on Windows 7 in local mode; and was eventually updated and tested with Storm v0.9.3 on 22nd January, 2015. Application may or may not work with earlier or later versions than Storm v0.9.3.

This application has been tested in:

  • Local mode on a Ubuntu virtual machine and even on Microsoft Windows 7 machine.
  • Cluster mode on a private cluster and also on Amazon EC2 environment of 4 machines and 5 machines respectively; with all the machines in private cluster running Ubuntu while EC2 environment machines were powered by CentOS.
    • Recent update to Apache Storm v0.9.3 has not been tested in a Cluster mode.

Features
  • Application retrieves tweets using Twitter Streaming API (using Twitter4J).
  • It analyses sentiments of all the tweets originating from UK [based on latlong bounding box].
  • There are three different objects within a tweet that we can use to determine it’s origin. This application tries to find the location using all the three options and prioritizes location received in the following order [high to low]:
    • The coordinates object. -- I solely rely on the coordinates object of a tweet in this project.
    • The place object. -- Not considered in this project.
    • The user object. -- Not considered in this project.
  • For reverse geocoding, this application uses Google Maps API.
    • For more information and sign up, please check Google Maps API page.
    • Please note that you would need Google account for signing up for Google Maps API key.
      • Also, for previous reverse geocoding experiments, I chose Bing Maps and not Google Maps since Google Maps is too restrictive for our usage, as it has a limit of only 2500 requests per day and Bing allows 50k requests per day.
      • But Bing's geocoding for UK is really not upto the mark. Google Maps API is far better and translates almost every latlong request to a place. So, I have gone with Google Maps API for this project due to the greater accuracy.
      • Just for the perspective, Bing Maps could convert only 3 out of 400 latlong requests to a location. While Google Maps API converted almost 95% of the latlong requests to a location.

  • This application uses AFINN which contains a list of pre-computed sentiment scores.
    • These words are used to determine sentiment of the each tweet which is retrieved using Streaming API.
  • By understanding sentiment values, we can get the most happiest county / region of UK and most unhappiest county / region as well.
  • For visualization, I am using D3 to display the sentiment value of each county / region in real-time by conveying it in a color, appropriate to the sentiment value. Color of the county / region moves from Red to green, as the sentiment value decreases.
  • This codebase has been updated with decent comments, wherever required.
  • Also this project has been made compatible with both Eclipse IDE and IntelliJ IDEA. Import the project in your favorite IDE [which has Maven plugin installed] and you can quickly follow the code.

Note: Huge thanks to my colleague Ganesh Sastry for his help on generating the TopoJSON of UK and the UK map right.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2017-8-22 07:37:34
谢谢楼主分享!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-8-22 07:37:50
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群