This article deals with the problem of the status of norm and variation in NLP by proposing examples drawn from previous research concerning computer models used to represent French language acquisition. Two case studies illustrate the choice around the norm-variation axis: the automatic computation of a frequency distribution and the recognition of sequential patterns in words containing specific syllable sequences that are hard to learn due to their inner phonetic difficulty. Whether the level of analysis is the word (first example) or the phoneme (second example), obstacles and trade-offs come up in a similar way. The choice - often difficult and constrained - between the accuracy of the language description and the need to have uniform data for the machine to be easily handled. The avoidable and unavoidable biases, the precautions to be taken beforehand, as well as the advantages and disadvantages of these types of NLP models will be discussed. The article ends by outlining the possible future complementarities between qualitative and quantitative methods in current linguistics.

La norme et la variation dans le cadre du Traitement Automatique du Langage

Andrea Briglia
;
Massimo Mucciardi;Pirrotta Giovanni
2023-01-01

Abstract

This article deals with the problem of the status of norm and variation in NLP by proposing examples drawn from previous research concerning computer models used to represent French language acquisition. Two case studies illustrate the choice around the norm-variation axis: the automatic computation of a frequency distribution and the recognition of sequential patterns in words containing specific syllable sequences that are hard to learn due to their inner phonetic difficulty. Whether the level of analysis is the word (first example) or the phoneme (second example), obstacles and trade-offs come up in a similar way. The choice - often difficult and constrained - between the accuracy of the language description and the need to have uniform data for the machine to be easily handled. The avoidable and unavoidable biases, the precautions to be taken beforehand, as well as the advantages and disadvantages of these types of NLP models will be discussed. The article ends by outlining the possible future complementarities between qualitative and quantitative methods in current linguistics.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3282157
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact