Workshop on Training and Evaluation Data for Italian Large Language Models
The inaugural workshop marks the initial phase of the development of the Italian Large Language Models (LLM).
When: 18/12/2023
Where: DIAG, Sapienza Università di Roma - Aula Magna (first floor), Via Ariosto 25, Roma
LINK: https://uniroma1.zoom.us/j/86905776052?pwd=QWd4QzZhczlTSUdDeWZVdVhkNHlaZz09
Event starting at 2 p.m. CET
This inaugural workshop, focusing on the development of Large Language Models (LLM) for the Italian language, marks the initial phase of constructing a Large Multimodal Model within the framework of the Transversal Project "Vision, Language, and Multimodal Challenges" as part of the big project "Future Artificial Intelligence Research" (FAIR). The workshop is organized in collaboration with the CINI AIIS (Artificial Intelligence and Information Systems) laboratory, serving as the hub for the entire Italian AI community. The specific goal of this event is to inform and discuss the collection and curation of training and evaluation datasets, representing the foundational step towards the realization of Italian LLMs and LMMs.
Organizers:
Roberto Navigli (Sapienza University of Rome)
Rita Cucchiara (University of Modena and Reggio Emilia; CNR)
Agenda
Introductory session: 14:00 - 14:20
- 14:00 - 14:20 | Project Introduction
Roberto Navigli, Sapienza University of Rome
Rita Cucchiara (University of Modena and Reggio Emilia; CNR)
Invited Talks - 1st part: 14:20 - 16:00
- 14:20 - 14:40 | LLMs at Barcellona Supercomputing Center (slides)
Marta Villegas, Barcelona Supercomputing Center - 14:40 - 15:00 | Data for European Large Language Models: The European Perspective (slides)
Georg Rehm, DFKI - 15:00 - 15:20 | A Dataset Framework for Large Language Models (slides)
Malte Ostendorff, DFKI - 15:20 - 15:40 | Assessing Reliability of Knowledge in LLMs (slides)
Barry Haddow, University of Edinburgh - 15:40 - 16:00 | HPLT: Data and Models for European Languages (and more) (slides)
Sampo Pyysalo, University of Turku
Coffee break: 16:00 - 16:30
Invited Talks - 2nd part: 16:30 - 17:30
- 16:30 - 16:50 | Annotating Multilingual Heterogeneous Web-Based Corpora (slides)
Pedro Ortiz, DFKI - 16:50 - 17:10 | GPT-SW3: the first LLM for the North-Germanic languages (slides)
Magnus Sahlgren, AI Sweden - 17:10 - 17:30 | LLMs and Data Protection: General Considerations (slides)
Roberto Lattanzi, Dip. AI, Garante per la Protezione dei Dati Personali
Participant Presentations and Closing: 17:30-18:45
- 17:30 - 17:42 | Italian Benchmark Language Resources and Tools: EVALITA4ELG, UINAUIL and more
Viviana Patti, University of Torino - 17:42 - 17:54 | Collecting Italian Textual Data for the Medical Domain
Bernardo Magnini, FBK - 17:54 - 18:06 | Il Dato che non ti ho Dato chi te l'ha Dato? Building trust in data donors
Fabio Massimo Zanzotto, University of Rome Tor Vergata - 18:06 - 18:18 | The Italian challenge to Large Acoustic Models for automatic speech recognition and synthesis
Franco Cutugno, University of Napoli Federico II - 18:18 - 18:30 | The Weakest Link: Understanding How Data Influences ML Trustworthiness
Antonio Cinà , University of Genoa - 18:30 - 18:45 | Closing
In collaboration with: