Real-time Event Detection Using Twitter Analysis
Abstract
The prodigiously generated data by users on social networking and microblogging sites such as Facebook and Twitter is colossal. These user-contributed data are basically short messages that tend to reflect a variety of events in real time, making Twitter particularly well-suited as a source of real-time event content. Utilizing this feature of Twitter, we present Twitter-Detector, a system that performs event detection over the Twitter stream. The system monitors and identifies emerging events (i.e. ‘topics’) on Twitter in real-time and provides meaningful analytics that synthesizes an accurate description of each topic. Users interact with the system by ordering the identified events using different criteria and submitting their own descriptions for each event.
We discuss the motivation for event detection over social media streams and the challenges that lie therein. We then describe our approach to event detection, as well as the architecture of Twitter-detector. Finally, we lay out our demonstration scenario.
Keywords: Twitter analysis, Event detection.
Introduction
TWITTER is a popular micro-blogging service, that has received much attention in recent times. This online social network is used by millions of people around the world to remain socially connected to their friends, family members, and co-workers through their computers and mobile phones. A status update message, called a tweet, is often used as a message to friends and colleagues. A user can follow other users; that user’s followers can read her tweets on a regular basis. A user who is being followed by another user need not necessarily reciprocate by following them back, which renders the links of the network as directed. Twitter was launched on July 2006, and since then users have increased rapidly. [2]There are 330 million monthly users and 145 million daily active users on Twitter generating 65 million tweets per day. The service is still adding about 300,000 users per day. Many researchers have published their studies on Twitter to date, especially during the past year. Most studies can be classified into one of three groups: first, some researchers have sought to analyze the network structure of Twitter. Second, some researchers have specifically examined the characteristics of Twitter as a social medium. Third, some researchers and developers have tried to create new applications using Twitter. [3]Several important instances exemplify their real-time nature: in the case of an extremely strong earthquake in Nepal, many pictures were transmitted through Twitter. People were thereby able to know the circumstances of damage in Nepal immediately. [4]In another instance, when the fire broke out in Notre Dame Cathedral in Paris last April, the first reports were published through Twitter and Tumblr. In such a manner, numerous update results in numerous reports related to events. They include social events such as parties, baseball games, and presidential campaigns. They also include disastrous events such as storms, fires, traffic jams, riots, heavy rainfall, and earthquakes. [5]There has been a large amount of research in the area of sentiment classification. Generally, tweets are not as thoughtfully composed as reviews. Yet, they still offer companies an additional avenue to gather feedback. The research question of our study is, “can we detect such event occurrence in real-time by monitoring tweets? “This paper presents an investigation of the real-time nature of Twitter that is designed to ascertain whether we can extract valid information from it.
Literature review
Social Networks are a key part of this development, incorporating new information plus communication tools and attracting millions of users. Boyd and Ellison (2007) outline the term Social Network Sites (SNS), typified by individuals who construct an online profile communicating with other users and sharing common ideas, activities, events, and interests. [2]Location-Based Social Networks further enhance existing social networks, adding a spatial dimension with location-embedded services. For example, users upload geotagged photos via Flickr, check in at a venue with Foursquare, or comment on a local event via Twitter. Geoinformation extracted from these Location-Based Social Networks is usually included under the umbrella of Volunteered Geographic [3]Information (Sui and Goodchild 2011). However, Harvey (2013) argues that this would be more precisely labeled as “contributed” data since people do not consciously volunteer their data, but generate it in the process of using the platforms for their particular purposes. In the case of Twitter, users can post short-status messages with up to 140 characters and may include photo attachments, which are called “tweets”. These posts can contain specificsyntax such as hashtags (#) as a keyword or term assigned to a topic the users are discussing or commenting about. Furthermore, a user can subscribe to “follow” or become a “follower” of other users’ tweets with the possibility of replying directly (@) to all Twitter posts. According to Twitter, about 271 million monthly active users are generating an average of 500 million tweets per day (https://about.twitter.com/company). [4]With the permission of the user, each tweet contains a corresponding geo-location acquired from the GPS sensor within the mobile device. These location-driven social structures allow mobile device owners with ubiquitous internet access to exchange details of their personal location as a key point of interaction(Zheng 2011). Location-Based Social Networks are bridging the gap between our physical world and online social network services containing three layers of information according toSymeonidis et al. (2014): (1) a social network (user layer); (2) a geographical network (location layer); and (3) a semantic metadata network (content layer). Therefore, user posts in Twitter represent a spatiotemporal signal (geolocation and times-tamp of tweet) with a semantic information layer (the content of tweet message). After the user registration, all tweets can be collected in real-time through the official Twitter streaming API (https://dev.twitter.com/docs/api/streaming). [4]The API query allows the filtering of keywords and individual user posts to preselect tweets as well as the possibility of obtaining only georeferenced Twitter messages within a predefined bounding box. Analyzing this spatiotemporal information layer, which is a by-product of individual people’s social interaction, may lead to new insights into understanding spatial structures and underlying patterns. This interdisciplinary and relatively new research field of Location-Based Social Networksshows a lack commonly used online databases and available literature sources.
Implemented model
System Overview
Here first pre-processing of Data is done on the dataset. After this, the applications are categorized according to the attributes which we have taken into consideration. And this all is visualized to the user through the interface which we have built.
System Architecture:
Figure 1: Architecture of the system
The above architecture shows the flow of how the procedure of the system is going to work and how the interface is built. In the above architecture we can see the different steps that are used for the working of the system and the same are explained below:
Twitter-Detector performs event detection in two steps and analyzes events in a third step. First, it identifies ‘bulky/bursty’ keywords, i.e. keywords that appear suddenly in tweets at an unusually high rate. Subsequently, it groups bulky keywords into events based on their co-occurrence. In other words, an event is identified as a set of bulky keywords that occur frequently together in tweets. After an event is identified, Twitter-Detector extracts additional information from the tweets that belong to the event, aiming to discover interesting aspects of it. Each of the three steps described above is pictured as a component of the diagram shown in figure 1 and is described in detail in the following paragraphs.
Homescreen
Figure 2:- Home screen of event detection using Twitter analysis
The home screen of event detection using Twitter analysis is shown in fig. 2. Twitter-Detector treats bulky keywords as ‘entry points for event detection. In other words, whenever a keyword exhibits bulky behavior, Twitter-Detector considers this an indication that a new topic has emerged and seeks to explore it further Effective and efficient detection of bulky keywords is thus crucial to Twitter-Detector’s performance
Twitter Generation Prediction:
It consists of data regarding twitter in a particular area. According it generates the value for solar radiance.
Data Consumption Log:
It consists of a list of data regarding the consumption of data by an individual.
Data Generation Prediction:
Figure 3:- Details for checking data generation
The first and foremost thing is to generate a limited amount of data. For this, we have to collect a data set of Twitter. This data set is provided to Twitter for analyzing the relationship between Twitter and irradiance. After manual entering of city, state, and country code(as shown in fig.3), this analysis will collect data of messages from that city and accordingly calculate the value of irradiance.
Energy Consumption Log
The energy consumption log consists of a list of user data consisting of his/her name, email address, and consumption of energy. This data is used to keep a record of the consumption of energy by all the users. It comprises of two things:
Add Log:
It is a manual data-entering process wherein the consumption of each user’s data is entered manually.
Figure 4:- Entering details for energy consumption data of the user
After entering this data, the following will appear on the screen.
Figure 5:- Adding Log
Existing Log:
The existing log comprises of list of user data that consumes energy. This data is used by a machine learning model to predict further consumption of energy by the user. Accordingly, that much amount of energy is transmitted to the user. This ensures that the energy is not getting wasted.
· Figure 6:-List of user data in existing log
System Model:
Under the Tweets section, you can find a list of all your Tweets and the number of impressions. You can see individual Tweet performance, as well as recent months or a 28-day overview of cumulative impressions. Capitalize on this information by repurposing Tweets that gained the most impressions, or creating Tweets on a similar subject. You can also use the cumulative overview to compare monthly activity. What did you do differently in a month with higher impressions? Did you Tweet more frequently? Take a look and see how you can recreate months that earned you high impressions. Another option is to try out Promoted Tweets, which will help your content reach more people.
We’ve analyzed the pre-collected data set of temperature and irradiance onto which we are applying the linear regression model. Using this model, we’re predicting irradiance(dependent variable) with the help of Twitter.(independent variable) collected through the changing weather. The value of irradiance will help us analyze the amount of energy required for generation. This energy will be transmitted to all the locations. This transmitted energy when not required in one place is further redistributed to another user who is requesting more energy. This energy is redistributed by using FCFS(First Come First Serve) algorithm. The user requesting first will be provided with this redistributed energy irrespective of the amount of energy required by the usersю
Results
After analyzing We will first present our results for the objective subjective and classifications. These results act as the first step of our classification approach. We only use the short-listed features for both of these results. This means that for objective / subjective classification we have 5 features and for positive /negative classification we have 3 features. For both of these results, we use the Naïve Bayes classification algorithm because that is the algorithm we are employing in our actual classification approach at the first step. Furthermore, all the figures reported are the result of 10-fold cross-validation. We take an average of each of the 10 values to get from the cross-validation.
Focus on renewable energy
Also, data storage is provided so the data can be further used for checking no of users are there. The interface is built shows us the various co-relations of attributes that we have selected and how it affects the overall ratings of the applications.
When we upload a Dataset it takes time to load onto the system and the information is pre-processed and represented to us by doing all the analysis for predicting the result
The below graph depicts the prediction of irradiance using temperature,in this graph actual irradiance given in the data set is denoted by a blue line and the predicted irradiance is denoted orange line i.e figure 9.
Figure 9:- Graph of Generation of irradiance with the help of Twitter
Figure 10:- Graph of Generation of data with the help of irradiance
The above graph shows the data analysis done by linear regression algorithm and predicted energy generation with the help of irradiance which is shown in above Table II and figure I. In the above graph blue line depicts the actual amount of energy generation and the orange one denotes the predicted one.
Conclusion
The task of sentiment analysis, especially in the domain of micro-blogging, is still in the developing stage and far from complete. So we propose a couple of ideas which we feel are worth exploring in the future and may result in further improved performance. Apart from this, we are currently only focusing on unigrams and the effect of bigrams and trigrams may be explored. As reported in the literature review section when bigrams are used along with unigrams this usually enhances performance.
Future Scope
The future extent of this paper is vast and can be used in several ways and not just as Twitter, a micro-blogging service is estimated to have about 200 million registered users and these users create approximately 65 million tweets a day. Twitter users usually show their opinion about topics of their interest.
References
- Boguslavsky, I. (2017). Semantic Descriptions for a Text Understanding System. In Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”(2017) (pp. 14-28).
- Pak, A., & Paroubek, P. (2010, May). Twitter as a corpus for sentiment analysis and opinion mining. In LREc (Vol. 10, No. 2010).
- Scott, J. (2011). Social network analysis: developments, advances, and prospects. Social network analysis and mining, 1(1), 21-26.
- Statista, 2017, https://www.statista.com/statistics/282087/number-ofmonthly-active-twitter-users/
- Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012, July). A system for real-time twitter sentiment analysis of the 2012 us presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations (pp. 115-120). Association for Computational Linguistics.
- TextBlob, 2017, https://textblob.readthedocs.io/en/dev/
- Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135.
- Dos Santos, C. N., & Gatti, M. (2014, August). Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts
- Wilson, T., Wiebe, J., & Hoffmann, P. (2005, October). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing(pp. 347-354). Association for Computational Linguistics.
- Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167.
- Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011, June). Sentiment analysis of Twitter data. In Proceedings of the workshop on languages in social media (pp. 30-38). Association for Computational Linguistics.
- Rosenthal, S., Farra, N., & Nakov, P. (2017). SemEval-2017 task 4: Sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)(pp. 502-518).
- Kiritchenko, S., Zhu, X., & Mohammad, S. M. (2014). Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research, 50, 723-762.
- Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van Der Goot, E., Halkia, M., … & Belyaeva, J. (2013). Sentiment analysis in the news. arXiv preprint arXiv:1309.6202.
- Poria, S., Cambria, E., & Gelbukh, A. (2015). Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 2539-2544).