Lexical frequency of CCV onset clusters in Brazilian Portuguese: comparing adult speech, child directed speech and child speech in the open corpora FI and FDC

Andressa Toni


This paper aims to introduce to the linguistic community a new linguistic resource directed to Language Acquisition studies: the Child Speech Corpus (Corpus FI) and the Child Directed Speech Corpus (Corpus FDC). We built these corpora based on the naturalistic database of Santos (2005) and the computational tools of Benevides e Guide (2016). The corpora consist of a list of frequencies where the researcher can find phonological and morphological information (phonological transcription, stress transcription, syllabic structure, stress category, lexical category, lemma) extracted from the speech productions of 3 children (Corpus FI) and their mothers/caregivers (Corpus FDC). The goal of the paper is i) to describe the methods used in the corpora compilation, providing a basic usage guide; and ii) to show how these data can contribute to the language development research field. For that, we compare the segmental and prosodic frequencies of CCV syllables (Consonant1+Consonant2+Vowel) in adult speech, child directed speech and child speech, establishing how input frequencies influences children’s phonological acquisition path. Results point out to a similarity on CCV’s prosodic and segmental properties between the three corpora. CCV is mostly realized in prosodically salient positions, being usually restricted to the same consonant sequences. Due to CCV’s low frequency of use, low minimal pairs count and phonologically opaque contexts, we claim that input frequency is a factor that contributes to the long path of acquisition of this syllable type, which emerges before 2;0 years old and is acquired only between 5;0-6;0 years old.

Full-text of the article is available for this locale: Português (Brasil).


