Skip to main content

SoBigData Articles

Computational methods for the analysis of online hate speech against refugees and migrants - Part 2 -

Exploratory: migration studies

Expanding the Detection of Online Hate Speech against Refugess and Migrants – The PHARM project

Using the initial prototype developed in the STOP-HATE project (see our previous post) a transfer of this knowledge has been carried out to a larger project, entitled Preventing Hate Against Refugees and Migrants (PHARM). This is a European project, funded by the European Commission, within Rights, Equality and Citizenship programme REC-RRAC-RACI-AG-2019 (GA n. 875217), with the main objective of detecting online hate speech towards refugees and migrants on a large scale, and in this case not only in Spain, but also in Italy and Greece, and not only on Twitter, but now also in other sources where this type of hate speech is potentially spread. So, while in the proof of concept the hate speech on twitter addressed to 4 types of stigmatized audiences was modelled, this new project focuses only on the vulnerable public of refugees and migrants, and in this case, the goal is to model that hate speech with hate crimes.

Thus, supported in the methodological advances previously developed in STOP-HATE, and to extend the impact of this work, the current PHARM project is also being developed by the Observatory for Audiovisual Contents at the University of Salamanca (Spain), in consortium with the University of Milano (Italy) and the University of Thessaloniki (Greece). This European Project is going to be running from this year 2020 until 2022. As mentioned, the main goal is to model and monitor hate speech and hate crimes against refugees and migrants in Greece, Italy and Spain, in order to predict the rise of those crimes and combat them using cutting-edge techniques, such as data journalism and narrative persuasion.

The part of detection is based on online hate speech classification strategy developed in the STOP-HATE project, which includes large-scale text classification, supported by the SCAYLE supercomputing infrastructure, as well as natural language processing and supervised machine learning techniques. The main challenges in this case are the identification and reduction of online hate speech against displaced people in Twitter, but also in other online sources, as Youtube, Instagram or websites and blogs of media, political parties and associations; but also, the prediction of possible hate crimes that may be committed in the different geographical locations of the three included countries, and the countering of hate speech against refugees and migrants and its effects. For this, the initial purpose is to be able to develop an empirical perspective about the relationship between the hate speech we are modelling and hate crimes in the three involved countries. In short, this project will also attempt to counteract the effects of these hate speech through data journalism techniques and narrative persuasion.

Graphic scheme of the PHARM project

Regarding how these objectives will be achieved, first, the team will identify online hate speech contents and potential sources of hate speech online, and develop empirical knowledge and understanding about the way those contents can predict real geo-localized hate crimes.

Secondly, hate messages will be identified and monitored in real time using the detection mechanisms based on deep models already developed in STOP-HATE, but in this case, with larger training corpus, in more sources and in all three languages.

Then, geo-localized databases of both hate speech messages and hate crimes will be generated in the three countries of the consortium. In this point, PHARM will model hate crime based on the descriptive features of hate speech against refugees and migrants in order to predict future hate crime episodes.  We will then create a database of hate crimes and will use cutting-edge ML algorithms to model the occurrence of physical/verbal aggressions against migrants and refugees.

Lastly, mechanisms for predicting and preventing hate crimes will be developed based on these databases. This prediction will be carried out from a geo-localized hate crime early warning system based on that occurrence of hate speech recorded.

Diagram of the PHARM geo-localized early warning system

Finally, awareness-raising and moderation activities based on data journalism and narrative persuasion will be carried out aimed at the main sources of hate speech identified, with the purpose of counteracting the effects of mass hate speech online. Specifically, data-driven news pieces and first-person testimonies will be used trying to prevent and counter the rise of online hate against refugees and migrants, and thus statistically combat and reduce hate crimes against refugees and migrants in these southern countries. PHARM will also create and disseminate counter-narrative fictional stories using narrative persuasion, and will evaluate its effects in countering the production of hate speech.

With this project, the Observatory for Audiovisual Contents team aims to complete a research work of more than 4 years, generating the greatest strategy of detecting online hate speech against refugees and migrants in Spanish, Italian and Greek; generating the largest database of online hate speech messages and hate crimes committed against these groups in southern Europe; further reducing both the production of that online hate speech and accordingly the hate crimes themselves; and lastly, having developed a technological tool for predicting possible hate crimes in the 3 countries involved, using computational methods.

For more information, visit the project website:

Written by: Carlos Arcila Calderón and Javier J. Amores, Observatory for Audiovisual Contents (OCA), University of Salamanca

Revised by: Matteo BohmLuca PappalardoLaura Pollacci