Skip to main content

SoBigData Event

The OpenAIRE Datathon

The OpenAIRE datathon aims at stimulating developers and data scientists at analysing the OpenAIRE Information Space with the intent of improving its consumption by users and third-party services. The OpenAIRE information space consists of a scholarly communication graph interlinking publications, datasets, software, research organizations, funders, and projects. The graph is the result of harvesting metadata from around 3000 data providers, harmonizing such metadata, and keeping or inferring links between graph objects described by such metadata. Inference is the result of text-mining a pool of Open Access article full-texts, which numbers around 6 Million full-texts. The graph counts around 60M objects, is openly accessible via APIs and a web portal, and is used today to offer research impacts statistics (e.g. number of products linked to given funders), Open Access trends (e.g. Open Access ratio of products published by given funders), and discovery of interlinked scholarly products (e.g. articles linked to datasets, software linked to articles for communities).

The datathon encourages teams of computer scientists, data scientists and experts from other fields to join the challenge of studying and analysing the OpenAIRE graph to enhance its discovery and statistical capabilities, in order to better serve the mission of Open Science. Buzz-topics leading the challenge are:

  • Enabling multi-disciplinary or discipline-specific discovery or stats functionality
  • Novel techniques to enable measurement of scientific impact, e.g. counters, links, provenance
  • Novel techniques to measure scientific impact, e.g. measures of quality Enabling reproducibility, e.g. re-use oriented metadata, meaningful interlinking of objects
  • De-duplication of the information space, e.g. disambiguation of authors, disambiguation of organizations

Data analysis can be performed using any cutting-edge methodology and technology, with the intention of deriving higher quality, statistics, enrichments from an original data collection, in order to make it more usable and interesting to the intended users. Teams can take advantage of the data analysis tools made availalble through this platform.

The three teams with the most outstanding and innovative solutions will win the datathon and be awarded a prize.