The Linguistic Computing Laboratory (LCL) is part of the Computer Science Department of the Sapienza University of Rome. The group conducts state-of-the-art research in the area of Natural Language Processing.

The group aims at devising and developing algorithms and methods in the context of machine learning, pattern matching and recognition and natural language processing to solve problems related to automatic text understanding, construction, learning and population of ontologies, semantic text indexing and classification, query expansion, question answering, etc.

Research fields include:
  • Multilingual Word Sense Disambiguation and Induction
  • Multilingual Entity Linking
  • Broad and Deep Learning
  • Distributional semantic similarity
  • Ontology Learning and Population
  • Large-Scale Knowledge Acquisition
  • Semantic and Statistical Machine Translation
  • Semantic Information Retrieval
  • Social Network Analysis and Mining
Pre-Ph.D.+Ph.D. Position in Multilingual NLP (Semantic Parsing)

One pre-Ph.D. research position (with the possibility of starting a Ph.D. in 2018 on the same salary with a privileged track) in Natural Language Processing is open.

The position is part of a new 5-year ERC Consolidar Grant funded by the European Research Council (ERC) and headed by prof. Roberto Navigli, following the success of his MultiJEDI ERC Starting Grant (http://multijedi.org). The successful candidate will participate in a frontier research project aimed at designing and investigating novel neural network architectures for multilingual disambiguation and semantic parsing and will work in the vibrant environment of a leading and highly-active international research team comprising 3 faculty members, 1 post-doc and 6 Ph.D. students. The group has published dozens of papers in top-tier venues in the field of computational linguistics and artificial intelligence.

Interested students and collaborators in the research group have the option to interact with Babelscape, a Sapienza startup company founded by prof. Navigli which brings research in multilingual Natural Language Processing to the market and makes research projects, such as the award-winning BabelNet, sustainable in the long term. Babelscape is currently working for key players in different fields, including multinational companies, and national and international public bodies. Around 15 developers and researchers are working in the company.

REQUIREMENTS/QUALIFICATIONS

The successful candidate will work actively on new directions in deep learning and neural networks for multilingual lexical semantic tasks such as Word Sense Disambiguation, Entity Linking and semantic parsing in arbitrary languages, starting from successful approaches and resources for multilingual lexical semantics such as BabelNet (winner of the 2017 Artificial Intelligence prominent paper award), the Multilingual Wikipedia Bitaxonomy, SensEmbed, NASARI explicit and embedded vectors, state-of-the-art neural Word Sense Disambiguation and train-o-matic.

The candidate is expected to have:
  • a M.Sc. or equivalent in Computer Science, Computational Linguistics/NLP, Mathematics or related fields.
  • Good programming skills in Python and/or Java (there is the option to attend a Python/Java course at Sapienza on the first contract year).
  • Fluent English. Knowledge of other languages (especially Asian languages) is more than welcome. Knowledge of Italian is NOT a requirement.
  • Knowledge of current neural network models, especially recurrent neural networks such as LSTMs, and tools for neural networks (e.g. Tensorflow, Keras, Torch, Theano, etc.) is a plus.
  • Publications in Computational Linguistics, participation in summer schools and other experiences are a plus.
INFORMATION
  • Application deadline: early October 2017
  • Interviews will take place via Skype around the end of October/beginning of November
  • Starting date: as early as possible, ideally on December 1st, 2017
  • Duration: 1+3 years
  • Salary: 25000 euros per annum. Note that this type of research contract is exempt from taxes, while including social insurance (25000 euros per annum corresponds to around 1560 euros net per month).

HOW TO APPLY

Information can be requested by email to Roberto Navigli (navigli@di.uniroma1.it). The application requires a brief motivation letter, a detailed CV and contact details for up to three references. Please include the job reference [LCL1-2017] in the subject line.

Candidates attending EMNLP this week are welcome for an informal chat on the position.
ERC Consolidator Grant!

Prof. Roberto Navigli has been awarded a prestigious ERC Consolidator Grant in Computer Science and Informatics. Stay tuned for important updates and open positions!

NASARI and MultiWiBi in Artificial Intelligence

Two new Artificial Intelligence Journal articles from LCL: NASARI and MultiWiBi:

José Camacho Collados, Mohammad Taher Pilehvar and Roberto Navigli. Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence (2016), volume 240, pages 36-64.

Tiziano Flati, Daniele Vannella, Tommaso Pasini and Roberto Navigli. MultiWiBi: The multilingual Wikipedia bitaxonomy project. Artificial Intelligence (2016), volume 241, pages 66-102.
ACL Tutorial 2016: Semantic Representations of Word Senses and Concepts

The LCL members José Camacho Collados, Ignacio Iacobacci, Roberto Navigli, and Mohammad Taher Pilehvar (currently at the University of Cambridge) will be presenting a tutorial on “Semantic Representations of Word Senses and Concepts” in Berlin at the ACL conference (August 7th, 2016).
José Camacho Collados received a prestigious Google PhD Fellowship!

We are proud to announce that José Camacho Collados has been awarded with the 2016 Google Fellowship in Natural Language Processing!
BabelNet 3.7 is now out!

We are happy to announce the release of a new version of BabelNet.

BabelNet (http://babelnet.org) is the largest multilingual encyclopedic dictionary and semantic network created by means of the seamless integration of the largest multilingual Web encyclopedia - i.e., Wikipedia - with the most popular computational lexicon of English - i.e., WordNet, and other lexical resources such as Wiktionary, OmegaWiki, Wikidata, Open Multilingual WordNet, Wikiquote, VerbNet, Microsoft Terminology, GeoNames, WoNeF, ImageNet, ItalWordNet, Open Dutch WordNet and FrameNet. The integration is performed via an automatic linking algorithm and by filling in lexical gaps with the aid of Machine Translation. The result is an encyclopedic dictionary that provides Babel synsets, i.e., concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations.

Version 3.7 comes with the following features:
  • New resource integrated: FrameNet.
  • More than 2500 Babel synsets identified as key concepts.
  • Mappings with several versions of WordNet now integrated (from 1.6 to 3.0).
  • More than 2.6 million Babel synsets labeled with domains (were 1,558,806 in v3.6).

More statistics are available at:http://babelnet.org/stats

BabelNet was part of the MultiJEDI project originally funded by the European Research Council and headed by Prof. Roberto Navigli at the Linguistic Computing Laboratory of the Sapienza University of Rome. BabelNet is now a self-sustained project. It is, and always will be, free for research purposes, including download. Babelscape, a Sapienza startup company, is BabelNet's commercial support arm, thanks to which the project will be continued and improved over time.

Enjoy!
BabelNet in TIME magazine!

BabelNet features prominently in TIME magazine, in the inspiring article "Redefining the modern dictionary" by Katy Steinmetz. The article talks about the new age of innovative and up-to-date lexical knowledge resources available on the Web, and describes in some detail how BabelNet is playing a leading role in this 21st century scenario!
BabelNet 3.6 is now out!

As the final output of the "MultiJEDI" Starting Grant (http://multijedi.org), funded by the European Research Council and headed by Prof. Roberto Navigli, the Linguistic Computing Laboratory of the Sapienza University of Rome is proud to announce the release of BabelNet 3.6.

Version 3.6 comes with the following features:
  • New resources integrated: ItalWordNet, Open Dutch WordNet.
  • 625 million new senses (now totalizing 745 million Babel senses, increasing language coverage considerably).
  • 6.4 million surface forms for Babel synsets.
  • 3.5 million YAGO external links.
  • Improved version of the Java and HTTP RESTful API (http://babelnet.org/download)
  • For fans of offline processing with non-commercial purposes: downloadable offline indices starting shortly!

More statistics are available at: http://babelnet.org/stats

Enjoy!
The Luxembourg BabelNet Workshop

2-3 March, 2016, Luxembourg
http://babelnet.org/lux

Schuman Building of the European Parliament, Hemicycle. 2929 Luxembourg

Organized by:
EU Publications Office, European Commission, European Parliament

We are proud to announce the Luxembourg BabelNet workshop. This event is a technical workshop on BabelNet, the largest multilingual encyclopedic dictionary and semantic network -- now also a huge knowledge base -- covering 14 million concepts and named entities in 272 languages. The workshop will take place over two days. The first day is a technical guided tour, including industrial applications. The second day consists of four case studies of resource mapping to BabelNet.

The workshop is open to all comers. It will be held in English and attendance is free, up to the room capacity.

This is an IT technical workshop; the ideal background of participants is computer science and natural language processing. However, participants from other backgrounds with an interest in the IT aspects of their specialty (like authors, translators and publishers) should also benefit, though they must be aware of the technical IT nature of the workshop. Regardless of their background, participants will gain a deep understanding of BabelNet: at least being able to properly use even the most advanced functionalities, such as traversing the network, multilingual disambiguation and high-performance mapping; perhaps capable of contributing at the conceptual level; at the higher end, contribute and getting involved in the development.
BabelNet 3.5 is now out!

As an output of the "MultiJEDI" Starting Grant, funded by the European Research Council and headed by Prof. Roberto Navigli, the Linguistic Computing Laboratory http://lcl.uniroma1.it of the Sapienza University of Rome is proud to announce the release of BabelNet 3.5.

BabelNet (http://babelnet.org) is a very large multilingual encyclopedic dictionary and semantic network created by means of the seamless integration of the largest multilingual Web encyclopedia - i.e., Wikipedia - with the most popular computational lexicon of English - i.e., WordNet, and other lexical resources such as Wiktionary, OmegaWiki, Wikidata, Open Multilingual WordNet, Wikiquote, VerbNet, Microsoft Terminology, GeoNames, WoNeF, and ImageNet. The integration is performed via an automatic linking algorithm and by filling in lexical gaps with the aid of Machine Translation. The result is an encyclopedic dictionary that provides Babel synsets, i.e., concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations.

Version 3.5 comes with the following features:
More statistics are available at: http://babelnet.org/stats

Enjoy!
BabelNet received the prestigious META prize 2015!!!

BabelNet - for groundbreaking work in overcoming language barriers through a multilingual lexicalised semantic network and ontology making use of heterogeneous data sources. The resulting encyclopedic dictionary provides concepts and named entities lexicalised in many languages, enriched with semantic relations.
Babelfy 1.0 is now out!

Babelfy is a joint, unified approach to Word Sense Disambiguation and Entity Linking in language of choice. The system is based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Its performance on standard word sense disambiguation and entity linking tasks is on a par with, or surpasses, those of language- and task-specific state-of-the-art systems.

New features in Babelfy v1.0:
  • 271 languages covered plus a novel language-agnostic setting!

  • Available via easy-to-use Java and HTTP RESTful APIs.

  • The input context can be either a text or a bag of words where you can mix up languages!

  • Plenty of tunable parameters for the disambiguation procedure such as setting your own threshold, enabling multiple scored annotations of the same fragment, restricting the annotations to WordNet, Wikipedia or BabelNet, input the offsets that you want to be linked, provide pre-annotated tokens as disambiguation context, disable/enable the most common sense heuristic, multi-word expressions and the densest subgraph heuristic.

  • Three different scores are now output: the disambiguation score, a coherence score and a global relevance score.

  • Disambiguation and entity linking is performed using BabelNet, thereby implicitly annotating according to several different inventories such as WordNet, Wikipedia, Wiktionary, OmegaWiki, etc.

BabelNet 3.0 is now out!

BabelNet (http://babelnet.org) is a very large multilingual encyclopedic dictionary and semantic network created by means of the seamless integration of the largest multilingual Web encyclopedia - i.e., Wikipedia - with the most popular computational lexicon of English - i.e., WordNet, and other lexical resources such as Wiktionary, OmegaWiki, Wikidata, and the Open Multilingual WordNet. The integration is performed via an automatic linking algorithm and by filling in lexical gaps with the aid of Machine Translation. The result is an encyclopedic dictionary that provides Babel synsets, i.e., concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations.

Version 3.0 comes with the following features:
  • 271 languages now covered!

  • New Java and HTTP RESTful API

  • Fully taxonomized thanks to the seamless integration of our Wikipedia Bitaxonomy

  • 13.7 million meanings (Babel synsets)

  • 40.3 million textual definitions

Tutorials in 2014

We are presenting tutorials at four different conferences:
Babelfy released!

We are happy to announce the release of Babelfy which is a unified approach to multilingual Word Sense Disambiguation and Entity Linking.
Babelfy website
ERC Starting Grant!

Prof. Roberto Navigli has been awarded a prestigious ERC starting grant in computer science and informatics (2011-2016). The project, called MultiJEDI, will focus on multilingual semantic processing. Many positions are open on the project.