The corpus of the Lessico di frequenza dell'italiano parlato (LIP Corpus) is one of the most important collections of texts of spoken Italian. It was collected in 1990-1992 by a group of linguists under the direction of Tullio De Mauro and was used to compile, in collaboration with IBM Italy, the first frequency dictionary of spoken Italian (cf. De Mauro, Mancini, Vedovelli, Voghera 1993). Its 469 texts, which amount to a total of approximately 490,000 words, were recorded in four cities (Milan, Florence, Rome, and Naples) and derive from five macro-types and numerous subtypes of discourse.

We thank Tullio De Mauro, Federico Mancini, Massimo Vedovelli, Miriam Voghera, the publisher ETAS Libri S. p. A., and IBM Italy for conceding us the right to use the corpus.

fondazione ibm italia