Home


A Learner Dictionary of Italian Collocations

Project Description and Objectives

PRIN: Progetti di ricerca di rilevante interesse nazionale (Research Projects of Relevant National Interest) - Bando 2022 Prot. 2022HXZR5E

SSD: GLOT-01/A (Glottology and Linguistics); INF-01 (Informatica)

Settori ERC: SH4 8 Language learning and processing (first and second language); SH4 9 Theoretical linguistics; Computational linguistics

Participating Institutions: University for Foreigner of Perugia Perugia (lead university); University of Perugia (partner university)



Collocations are word combinations that frequently occur together and exhibit a high degree of conventionality. They play a central role in fluent and natural language use. Research shows that native speakers rely on these conventional combinations, which allow for more natural and fluent language production. For second language (L2) learners, however, mastering such expressions can be challenging, with significant effects on comprehension and linguistic production.

Despite the importance of collocations in second language learning, empirically-based lexicographic resources specifically designed for L2 learners of Italian are still lacking. The DICI-A project (Dictionary of Italian Collocations for Learners) was developed to fill this gap, providing a collocation dictionary designed to support the learning of Italian as a second language.

The main objective of the project is, in fact, to create and make publicly available the DICI-A, a new lexicographic resource specifically designed for learners of Italian L2. The DICI-A is a digital dictionary searchable via a freely accessible interface, compatible with both computers and smartphones.

It serves as a reference tool, based on balanced corpora of written and spoken language, and designed according to rigorous statistical methods, with collocations also graded by proficiency level. The DICI-A thus sits at the intersection of lexicography, language learning, and corpus linguistics, and was developed by combining quantitative methods, qualitative evaluations, and artificial intelligence supported by human judgments.

Key Features of the project

  • Corpus-Based Approach

    Collocations are extracted from a reference corpus of Italian, encompassing diverse textual genres and linguistic registers.

  • Quantitative and Statistical Methodologies

    Collocations are filtered and ordered based on frequency, dispersion measures across different textual genres in the corpus, and association measures.

  • Proficiency Level Assignment

    Each collocation is assigned to a proficiency label based on the Common European Framework of Reference (CEFR).

  • Definitions and examples

    Each collocation is accompanied by a definition, which explains collocation's meaning(s) and one or more examples of use.

The DICI-A is a resource aimed at students and teachers of Italian as a second language, as well as scholars and researchers. It contains more than 11,000 Italian collocations selected from a vast corpus of authentic Italian texts, defined, described and categorised according to the learners' level of proficiency.

Principal Investigator


Professor Stefania Spina
Department of Italian Language, Literature and Arts in the World
Università per Stranieri di Perugia,
Piazza Fortebraccio, 4, 06123, Perugia