Skip to main content

SoBigData Articles

Entity-linking suite for texts, posts and queries

After about eight years of algorithm design, software development and tuning, and large experimental tests I wish to share in this post the news about the public availability of a robust API offering some powerful entity-linking functionalities which are based on the tools we have published in the last few years: TagMe, WAT, SWAT and SMAPH.

All these tools can be now freely accessed via the EU SoBigData Infrastructure.

The description of each of the four tools is present in that page, I summarize here their main characteristics:

TagMe was designed and published in 2010 (ACM CIKM, IEEE Software) and, probably, it is the most famous and used entity-linker among the publicly accessible ones worldwide. Up to today, it served more than 800 million queries with peaks of tens of million queries per month. TagMe is currently used in several other software tools, one notable example is Ruby Star, which has been shortlisted among the finalists of the Alexa Prize 2017 (paper).
WAT is a sophisticated evolution of TagMe based on the Wikipedia Knowledge Graph, and some algorithms based on Word2Vect and Enty2Vect techniques. WAT improves significantly TagMe's efficacy and efficiency (ERD@SIGIR 2014) over well-formed texts.
SWAT is an add-on of TagMe and WAT that adds saliency evaluation to entity linking so that one can, not only, annotate an input document with entities drawn from Wikipedia (à la TagMe or WAT), but also can assigns to every annotated entity a saliency score that could be subsequently used for specific post-processing tasks, such as document clustering and classification, entity filtering, etc. etc. (NLDB 2017).
SMAPH is an entity-annotator for open-domain web search queries. It is based on a second-order approach that, by piggybacking on a web search engine (either Bing or Google), alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them (WWW 2016). SMAPH was the winner of the short track in the ERD@SIGIR2014 benchmark (ERD@SIGIR 2014).
Several other papers authored by my group or by other researchers have shown the power of entity-linking tools in performing text processing tasks, such as: clustering (ACM WSDM 2012), classification (ECIR 2012), hashtag processing (AAAI ICWSM 2015), entity relatedness (ACM CIKM 2017), and expert retrieval (ArXiv 2018), just to mention our publications. This research has also contributed to develop a testing platform for entity linkers, called GERBIL (WWW 2015, WWW 2013).

Several master and phd students contributed to design and implement these tools, and they are listed as authors of the various papers mentioned above. Here, they are reported in alphabetical order: Paolo Cifariello, Marco Cornolti, Marco Ponza, Francesco Piccinno, Roberto Santoro, Ugo Scaiella, Daniele Vitale.

I thank all of our funding supports from Google (two Research Faculty Awards in 2010 and 2012), one PRIN MIUR grant, one EU granting "SoBigData Infrastructure" (INFRAIA-1-2014-2015, agreement #654024), one Bloomberg Data Science Research Grant (2017).

Enjoy these entity-linkers via the EU SoBigData Infrastructure!

Article published on LinkedIn