Effect of utterance style and speaker on the minimum sample size for estimating speech production rate

Pablo Arantes


We investigated the role of speaking style and individual speakers on estimating the minimum sample size required for stable estimation of speaking rate. The compared speaking styles are semi-spontaneous interviews and sentence reading. We analyzed 20 speech samples, 10 in each style, from 5 male and 5 female speakers. Stabilization times are the point along the time series defined by successive values of cumulative speaking rate where variability is reduced. Two criteria for defining stability are presented and compared, one based on the change point statistical analysis and one on a perceptual threshold. We also tested the effect of progressively increasing the sample size submitted to stability analysis (starting with 30 seconds and reaching up to 300 seconds). The results show that average stabilization times depend on the criteria used for detection, but are generally longer for the semi-spontaneous style, ranging from 60 to 70 seconds for reading and 80 to 110 seconds for semi-spontaneous speech. Stabilization times tend to be longer as the sample duration increases. Speaker sex has no significant impact on stabilization times. Estimates of stabilization time vary among different speakers almost as much as intra-speaker variability. The results are relevant to forensic phonetics applications because they suggest, based on an explicit and reproducible methodology, what is the minimum duration a speech sample needs to have in order to estimate from it the speech production rate for speaker comparison purposes.


