In my quest to practice R and learn text mining, I am looking at one of the popular Twitter Wars between two political personalities of India who are fondly known in the TwitterVerse as ‘Pappu’ and ‘Feku’ which is basically their ‘ghar ka naam’ or ‘pyar wala naam’. Anyway, the discussion about the origin of the names is beyond the scope of this post. What I was interested in finding out is what do people talk about or rather tweet about when they are fondly remembering these two prominent personalities. In order to do this I wrote a text mining program in a popular & open source language called R (technical details & code shared later in the post). For this purpose I used #pappu & #feku to fetch the relevant tweets. I was able to fetch about 1089 tweets for #pappu and about 1140 tweets for #feku. After removing the common words (stopwords) like ‘and’, ‘is’ ‘are’, ‘the’, etc. I created a wordcloud to visually represent the data. To reduce noise and get a better less cluttered picture, only those words which featured at least 10 times (min. freq = 10) were selected. Below is what I found.
The words which are in bigger font sizes are the ones that occur the most. Also the words of same size and colors are the ones that have same frequency. For example in the case of #pappu, state & mind are the most frequently occurring terms. It is also important to note that the Twitter data is basically temporal and the value of insights derived tends to decay quickly overtime.
Anyways, here we see the most popular words, associated with #pappu & #feku. First thing that comes to mind is the fact that, there is more diversity in tweets about #feku. The most likely reason is that the said personality has been active in the political life a lot longer, has done & said many more things, thus giving more things to tweet about and more room to criticize or lament on the other hand, there is not much to criticize about #pappu because there nothing really been done apart from statements & comments here and there.
There are many other obvious things here and one can really see how the two camps are trying to politically define the two. In case of #pappu it his recent comments as well as the controversy about in-laws and family in general. For #feku it is largely about the his claims of development, and allegations about his communal image. In both cases, the discourse is shaped and driven by the political opinions in the mainstream narrative about the two personalities.
One really interesting thing (interesting because I did not anticipate it :P) that came out here was the fact that I was also able to get information about the most enthusiastic/active tweeters of the both #pappu and #feku. So you notice various twitter handles also in the word cloud, well basically these are the handles which tweet a lot with the given hashtags. One can simply search these handle on Twitter to find out more, give it a try some might surprise you. It also might be worth exploring how much of the overall tweet content is driven by such users and how much is unique, however this would be the subject of the post.
Overall I wish to track this over the course of time and see how the discourse is shaped as we near general elections in 2014, keep watching. Let me know your thoughts, comments and ideas.
To construct the above, I used three important R packages – TwitteR, tm and wordcloud along with RJSONIO and RcolorBrewer.
Unfortunately I was not able to fetch 1500 tweets, this might have to do something with the restrictions on get API in Twitter
Below is the code:
<em>library("ROAuth") #OAuth for twitter API</em> <em>library("twitteR")</em> <em>library("RJSONIO") #To resolve issues with JSON format, only load this after loading TwitteR</em> <em>library("wordcloud")</em> <em>library("tm")</em> <em>#directly loading saved authentication file</em> <em>#load("twitter_auth.Rdata")</em> <em>#necessary step for Windows</em> <em>download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")</em> <em>#Registering on Twitter API, only first time</em> <em>reqURL <- "https://api.twitter.com/oauth/request_token"</em> <em>accessURL <- "http://api.twitter.com/oauth/access_token"</em> <em>authURL <- "http://api.twitter.com/oauth/authorize"</em> #To fetch your consumer key go to https://twitter.com/apps/new and log-in ,read TwitteR documentation for details <em>consumerKey <- "dummy"</em> <em>consumerSecret <- "dummy"</em> <em>Cred <- OAuthFactory$new(consumerKey = consumerKey, </em> <em> consumerSecret = consumerSecret, requestURL = reqURL,</em> <em> accessURL = accessURL, authURL = authURL)</em> <em>Cred$handshake(cainfo = "cacert.pem")</em> <em>#IMPORTANT: Run till the above line first, PIN will be asked, enter PIN & proceed</em> <em>#save for later uses & fetch using load as mentioned above</em> <em>save(Cred, file = "twitter_auth.Rdata")</em> <em>registerTwitterOAuth(Cred)</em> <em>#searchTwitter for Pappu</em> <em>pappu <- searchTwitter('#pappu', n = 1500,lang = 'en', cainfo = "cacert.pem")</em> <em>pappu <- sapply(pappu, function(x) x$getText())</em> <em>pappu_corpus <- Corpus(VectorSource(pappu))</em> <em>pappu_corpus <- tm_map(pappu_corpus, tolower)</em> <em>pappu_corpus <- tm_map(pappu_corpus, removePunctuation)</em> <em>pappu_corpus <- tm_map(pappu_corpus, function(x) removeWords(x, stopwords()))</em> <em>#Selecting color palettes for wordcloud</em> <em>library(RColorBrewer)</em> <em>pal2 <- brewer.pal(8,"Pastel2")</em> <em>wordcloud(pappu_corpus, scale = c(4,1), min.freq = 10, random.order = T, random.color = T, colors = pal2)</em> <em>#searchTwitter for feku</em> <em>feku <- searchTwitter('#feku', n = 1500,lang = 'en', retryOnRateLimit = 120, retryCount = 5, cainfo = "cacert.pem")</em> <em>feku <- sapply(feku, function(x) x$getText())</em> <em>feku_corpus <- Corpus(VectorSource(feku))</em> <em>feku_corpus <- tm_map(feku_corpus, tolower)</em> <em>feku_corpus <- tm_map(feku_corpus, removePunctuation)</em> <em>feku_corpus <- tm_map(feku_corpus, function(x) removeWords(x, stopwords()))</em> <em>wordcloud(feku_corpus, scale = c(2,1), min.freq = 10, random.order = T, random.color = T, colors = pal2)</em>
I have learned the above largely from: