Event attendance prediction using social media
Popular social media applications on smartphones (e.g. Facebook, Instagram, Twitter) enabled the creation of an unprecedented amount of user-generated content. Social media can be useful to extract valuable information concerning human dynamics and behaviors, such as mobility. Popular events such as music festivals attract thousands of participants. Usually, the presence is well reflected in social media networks, allowing people to connect with “the event”, expressing through posts their feelings, experiences or opinions well in advance of its planned date.

Fig. 1. Examples of tweets posted before, during and after the event
Given the attention to popular events reflected in social media, we tackle a novel, interesting problem: Is it possible to infer from Twitter posts the actual attendance of the user to the cited event? To answer this question, we conducted experiments on data from two large music festivals in the UK, namely the VFestival and Creamfields events. The research has been published in two important venues, a preliminary study at the ASONAM conference (ref.1) and a more detailed study in the IPM journal (ref.2).
The simplest way of inferring the users’ presence at events is to consider the geotag associated with their posts: the “check-in” or the user location in the event place at the time of the event can trivially be associated with attendance. There are nevertheless two drawbacks to this approach. The first drawback is that few social media users enable the geotagging of their posts (on Twitter, the percentage of geotagged posts is about 2%). Using this data to learn attendance prediction classifiers would be difficult and may lead to ineffective predictive models due to its sparsity. The second drawback of only using geolocated data is that they do not represent the intention of the user to participate in the event. To avoid these two aforementioned issues, we wish to infer the actual attendance of users to an event by only relying on the content of non-geotagged posts, without considering any spatial features.
For the event attendance classification task, we devise three distinct temporal intervals identifying when the posts have been shared on social media: before, during or after the event. For each of these three, we propose distinct classification tasks. The analysis of posts shared before the event serve as a predictor of the users' actual attendance, the analysis of posts shared during the event reflects the actual participation of users at the event, while the analysis of posts shared after the event offers an overview of past attendance. We come up with four different categories of features. Each category reflects a different facet of social media, namely the: textual, temporal, social, and multimedia dimensions.

Fig. 2. Heatmap with distribution by hometown of the inferred attendees at the Creamfields festival (red point).
Particularly interesting is the "before" case, since an early knowledge of the possible user attendance can be useful for proposing innovative services and applications. For example, event organizers or third-party companies could precisely target their advertisement campaigns by offering personalized services to the users most probable to participate in the event. Another relevant example is transportation planning, where attendance prediction could allow the organizers or the local authorities to urge potential attendees to use public transportation or can help bus and shuttle companies to plan and advertise collective transport services to the event.
The created models achieve a very high accuracy with the highest result observed for the Creamfields festival, exhibiting ~91% accuracy at classifying users that have expressed their intention to attend the event. Some of the most prominent features are the word embedding features that contribute to achieving high performance. The analysis of visual content is a growing trend in social media and could be successfully explored in the classification process through the use of deep learning techniques.
Exploratory:
- T10.3 Sustainable Cities for Citizens
Items in the Catalogue:
- Data: https://ckan-sobigdata.d4science.org/dataset/twitter_dataset_about_two_premier_uk_music_festivals
- Method: https://ckan-sobigdata.d4science.org/dataset/event_attendance_prediction_using_twitter
Sustainable Goals:
- Goal 11: Sustainable cities and communities
References:
- Vinicius Monteiro de Lira, Craig Macdonald, Iadh Ounis, Raffaele Perego, Chiara Renso, Valéria Cesário Times: Exploring Social Media for Event Attendance. ASONAM 2017: 447-450
- Vinicius Monteiro de Lira, Craig Macdonald, Iadh Ounis, Raffaele Perego, Chiara Renso, Valéria Cesário Times: Event attendance classification in social media. Inf. Process. Manage. 56(3): 687-703 (2019)
 
    