As a contribute to the scientific community working on the field of entity annotation, we developed a framework to compare text annotators: systems that, given a text document, aim at finding the entities the text is about, identified as Wikipedia pages. The BAT-Framework, written in Java, comes along with a formal framework that defines a set of problems, the way systems can be compared to each other, and a set of measures that – extending classic IR measures – fairly and fully compares entity annotators features. 

Bicriteria Data Compression (BcZip)

Bicriteria Data Compression is a novel compression paradigm which allows the user to trade decompression time and compressed size in a principled way. Shortly, the tool lets you specify a bound on the decompression time (say, 800 msecs), and compresses the file in such a way that the decompression time is below that time-bound and compressed size is minimized (or vice-versa).

Code is available here.


TAGME is a “topic annotator” that is able to identify meaningful sequences of words in a short text and link them to a pertinent Wikipedia page. This stunning contextualization has implications which go far beyond the enrichment of the text with explanatory links, because it concerns, in some way, with the understanding of the topics dealt within the text itself.