Classification des catégories grammaticales sur deux corpus longitudinaux d’enfants

Briglia, Andrea; Sauvage, Jérémi; Pirrotta, Giovanni; Mucciardi, Massimo

This article analyses two child spoken language longitudinal corpora from the CoLaJE project: a parts of speech automatic annotation was applied to each sentence (15'000 in total) using « Universal Dependencies » as a standard of reference and "stanza", a Python library, as an analysis tool. Age and error rate were used as criteria for the creation of nine strata: reducing the size of the corpus helps to make more easily interpretable clusters created with EM, an unsupervised method. Aim of the article is to propose a way to target the development of grammatical categories over time: two examples concerning the development of morphosintactic coherence are proposed, as well as two examples concerning the evolution of the relationship between the use of pronouns and nouns. A final discussion of the preliminary results and limitations of this research is then proposed.