Correlation Pattern Discovery

The Correlation Pattern Discovery process is based on two analytical tools: (i) a method to estimate the presence of people in different geographical areas; and (ii) a method to extract time- and space-constrained sequential patterns capable to capture correlations among geographical areas in terms of significant co-variations of the estimated presence. The result is a set of pattern describing relationship between areas highly correlated. Experiments are done over Paris and the whole france using call data records from a famous telecommunication operator.


As a contribute to the scientific community working on the field of entity annotation, we developed a framework to compare text annotators: systems that, given a text document, aim at finding the entities the text is about, identified as Wikipedia pages. The BAT-Framework, written in Java, comes along with a formal framework that defines a set of problems, the way systems can be compared to each other, and a set of measures that – extending classic IR measures – fairly and fully compares entity annotators features. 

Bicriteria Data Compression (BcZip)

Bicriteria Data Compression is a novel compression paradigm which allows the user to trade decompression time and compressed size in a principled way. Shortly, the tool lets you specify a bound on the decompression time (say, 800 msecs), and compresses the file in such a way that the decompression time is below that time-bound and compressed size is minimized (or vice-versa).

Code is available here.


TAGME is a “topic annotator” that is able to identify meaningful sequences of words in a short text and link them to a pertinent Wikipedia page. This stunning contextualization has implications which go far beyond the enrichment of the text with explanatory links, because it concerns, in some way, with the understanding of the topics dealt within the text itself.