Effects of speaking style on the shape of fundamental frequency distributions

Pablo Arantes

Resumo

The present study has two main goals. The first is to describe the effects of three speaking styles (spontaneous interview, sentence reading and word list reading) on statistical estimators of fundamental frequency (f0) variability (mean, standard deviation, skewness and kurtosis) in five female and five male speakers of Brazilian Portuguese (BP). Most f0 contours of word reading are bimodal. Analysis of their time-normalized contours suggests this is caused by the time-compressed realization of fast transitions from low to high or high to low tones aligned with stressed syllables. Considering only unimodal distributions, results show that there are no statistically significant effects in the male data for any of the four variability estimators. Effects show up in female data. Spontaneous style has statistically significant higher mean, SD and skewness than read speech. Findings in the previous literature indicate the reverse pattern, though, for languages other than BP. The second goal of the study is to characterize the statistical properties of f0 distributions beyond mean and SD. Results confirm previous observations that most f0 distributions have positive skewness, are left-tailed and have kurtosis values that deviate significantly from the normal because of large deviations from the central or modal value. A distribution fitting procedure tested six distributions. The asymmetric Burr type XII distribution emerges as the one that best fits the data in the corpus. Results show that two of the parameters that determine its shape correlate well with the empirical f0 distribution values of SD and skewness. Important effects of speaking style on f0 seen in female speakers can be reproduced by combinations of the Burr distributions’ parameters.

Referências

ANSCOMBE, F. J.; GLYNN, William J. Distribution of the kurtosis statistic b2 for normal samples. Biometrika, v. 70, n. 1, p. 227–234, 1983.
ARANTES, Pablo. better_f0: A Praat script for better f0 extraction. [S.l.]: Zenodo, 2019. Available from: https://zenodo.org/record/3470108. Date accessed: 25 dec. 2020.
ARANTES, Pablo. Time-Normalized-F0: Praat script to perform time-normalization of F0 contours. [S.l.]: Zenodo, 2018. Available from: <https://zenodo.org/record/1217159>. Date accessed: 25 dec. 2020.
ARANTES, Pablo. Time-normalization of fundamental frequency contours: a hands-on tutorial. In: MEIRELES, A. R. (Org.). . Courses on Speech Prosody. Newcastle upon Tyne: Cambridge Scholars Publishing, 2015. p. 98–123.
ARANTES, Pablo; ERIKSSON, Anders. Quantifying fundamental frequency modulation as a function of language, speaking style and speaker. In: INTERSPEECH 2019, 2019, Graz. Anais... Graz: ISCA, 2019. p. 1716–1720.
ARANTES, Pablo; LINHARES, Maria E. N. Efeito da língua, estilo de elocução e sexo do falante sobre medidas globais da frequência fundamental. Letras de Hoje, v. 52, n. 1, p. 26–39, 2017.
BENAGLIA, Tatiana et al. mixtools: An R Package for Analyzing Mixture Models. Journal of Statistical Software, v. 32, n. 1, p. 1–29, 2009.
BISHOP, C. M. Pattern Recognition and Machine Learning. New York: Springer, 2006.
BOERSMA, Paul. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences, v. 17, p. 97–110, 1993.
COWAN, Milton Jerome. Pitch and Intensity Characteristics of Stage Speech. Iowa City: Department of Speech, University of Iowa, 1936.
D’AGOSTINO, Ralph B. Transformation to normality of the null distribution of g1. Biometrika, v. 57, n. 3, p. 679–681, 1970.
DELIGNETTE-MULLER, M. L.; DUTANG, C. fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, v. 64, n. 4, p. 1–34, 2015.
DUTANG, C.; GOULET, V.; PIGEON, M. actuar: An R Package for Actuarial Science. Journal of Statistical Software, v. 25, n. 7, p. 1–37, 2008.
ERIKSSON, Anders. Aural/acoustic vs. automatic methods in forensic phonetic case work. In: NEUSTEIN, A.; PATIL, H. A. (Org.). . Forensic Speaker Recognition: Law Enforcement and Counter-terrorism. [S.l.]: Springer, 2011. p. 41–70.
ESKÉNAZI, Maxine. Trends in Speaking Styles Research. In: EUROSPEECH’93, 1993, Berlin. Anais... Berlin: [s.n.], 1993. p. 19–23.
FITCH, J. L.; HOLBROOK, A. Modal vocal fundamental frequency of young adults. Archives of Otolaryngology, v. 92, n. 4, p. 379–382, Outubro 1970.
FUJISAKI, H. A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In: FUJIMURA, O. (Org.). . Vocal Fold Physiology: Voice Production, Mechanisms and Functions. New York: Raven, 1988. .
FUJISAKI, H.; HIROSE, K. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustic Society of Japan, v. 5, n. 4, p. 233–242, 1984.
FUJISAKI, Hiroya; OHNO, Sumio; GU, Wentao. Physiological and Physical Mechanisms for Fundamental Frequency Control in Some Tone Languages and a Command-Response Model for Generation of their F0 Contours. In: INTERNATIONAL SYMPOSIUM ON TONAL ASPECTS OF LANGUAGES: WITH EMPHASIS ON TONE LANGUAGES, 2004, Beijing. Anais... Beijing: [s.n.], 2004. p. 1–4.
GOLD, Erica; FRENCH, Peter. International practices in forensic speaker comparison. The International Journal of Speech, Language and the Law, v. 18, n. 2, p. 293–307, 2011.
GOLD, Erica; FRENCH, Peter. International practices in forensic speaker comparisons: second survey. International Journal of Speech Language and the Law, v. 26, n. 1, p. 1–20, jun. 2019.
HARTIGAN, J. A.; HARTIGAN, P. M. The Dip Test of Unimodality. The Annals of Statistics, v. 13, n. 1, p. 70–84, 1985.
HIRST, Daniel J. The Analysis by Synthesis of Speech Melody: from Data to Models. Journal of Speech Sciences, v. 1, n. 1, p. 55–83, 2011.
HOLLIEN, Harry; HOLLIEN, Patricia; DE JONG, Gea. Effects of three parameters on speaking fundamental frequency. Journal of the Acoustical Society of America, v. 102, n. 5, p. 2984–2992, 1997.
HOLLIEN, Harry; PAUL, Patricia. A second evaluation of the speaking fundamental frequency characteristics of post-adolescent girls. Language and Speech, v. 12, n. 2, p. 119–124, Abril 1969.
HORII, Yoshiyuki. Some statistical characteristics of voice fundamental frequency. Journal of Speech and Hearing Research, v. 18, n. 1, p. 192–201, 1975.
HORII, Yoshiyuki. Some voice fundamental frequency characteristics of oral reading and spontaneous speech by hard-of-hearing young women. Journal of Speech and Hearing Research, v. 25, p. 608–610, 1982.
JASSEM, W.; KUDELA-DOBROGOWSKA. Speaker-independent intonation curves. In: WAUGH, L.; VAN SCHOONEVELD, C. H. (Org.). . The Melody of Language. Baltimore: University Park Press, 1980. p. 135–148.
JASSEM, W.; STEFFEN-BATÓG, M.; CZAJKA, S. Stastistical characteristics of short-term average F0 distributions as personal voice features. In: JASSEM, W. (Org.). . Speech Analysis and Synthesis. Warsaw: Panstwowe Wydawnictwo Naukowe, 1973. v. 3. p. 209–225.
JASSEM, Wiktor. Pitch and compass of the speaking voice. Journal of the International Phonetic Association, v. 1, p. 59–68, 1971.
JESSEN, Michael. Forensic phonetics and the influence of speaking style on global measures of fundamental frequency. In: GREWENDORF, GÜNTHER; RATHERT, MONIKA (Org.). . Formal linguistics and law. Berlin: Mouton de Gruyter, 2009. p. 115–139.
KARLSSON, I. et al. Within-speaker variability due to speaking manners. 1998, Sydney, Australia. Anais... Sydney, Australia: [s.n.], 1998. p. 2379–2382.
KENDALL, Tyler. Speech Rate, Pause, and Sociolinguistic Variation: Studies in Corpus Sociophonetics. London: Palgrave Macmillan, 2013.
KINOSHITA, Yuko; ISHIHARA, Shunichi; ROSE, Philip. Exploring the discriminatory potential of F0 distribution parameters in traditional forensic speaker recognition. The International Journal of Speech, Language and the Law, v. 16, n. 1, p. 91–111, 2009.
KINOSHITA, Yuko; SHUNICHI, Ishihara. F0 can tell us more: speaker verification using the long term distribution. In: AUSTRALASIAN INTERNATIONAL CONFERENCE ON SPEECH SCIENCE AND TECHNOLOGY, 2010. Anais... Melbourne, Australia: [s.n.], 2010. p. 50–53.
KOMSTA, Lukasz; NOVOMESTKY, Frederick. moments: Moments, cumulants, skewness, kurtosis and related tests. [S.l: s.n.], 2015. Available from: https://CRAN.R-project.org/package=moments.
KÜNZEL, Hermann. Some general phonetic and forensic aspects of speaking tempo. Forensic Linguistics, v. 4, n. 1, p. 48–83, 1997.
LIMPERT, Eckhard; STAHEL, Werner A.; ABBT, Markus. Log-normal Distributions across the Sciences: Keys and Clues. BioScience, v. 51, n. 5, p. 341, 2001.
LLISTERRI, Joaquim. Speaking styles in speech research. In: ELSNET/ESCA/SALT WORKSHOP ON INTEGRATING SPEECH AND NATURAL LANGUAGE, 1992, Dublin, Ireland. Anais... Dublin, Ireland: [s.n.], 1992. p. 1–28.
MAIDMENT, J. A.; LECUMBERRI, M. L. Pitch analysis methods for cross-speaker comparison. In: ICSLP 96, 1996, Delaware. Anais... Delaware: [s.n.], 1996.
MCLAUGHLIN, Michael P. Compendium of Common Probability Distributions. 2016. Available from: https://www.causascientia.org/math_stat/Dists/Compendium.pdf. Date accessed: 30 jul. 2019.
MIKHEEV, Y. Statistical distribution of the periods of the fundamental tone of Russian speech. Soviet Physics-Acoustics, v. 16, p. 474–477, 1971.
R CORE TEAM. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2020. Available from: https://www.R-project.org/.
ROSE, Philip. Considerations in the normalisation of the fundamental frequency of linguistic tone. Speech Communication, v. 6, n. 4, p. 343–352, 1987.
ROSE, Philip. How effective are long term mean and standard deviation as normalisation parameters for tonal fundamental frequency? Speech Communication, v. 10, n. 3, p. 229–247, 1991.
TRAUNMÜLLER, Hartmut; ERIKSSON, Anders. The frequency range of the voice fundamental in the speech of male and female adults. [S.d.]. Available from: http://www2.ling.su.se/staff/hartmut/f0_m&f.pdf. Date accessed: 25 dez. 2020.
WAND, Matt. KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995). [S.l: s.n.], 2015. Available from: https://CRAN.R-project.org/package=KernSmooth.
WESTFALL, Peter H. Kurtosis as Peakedness, 1905–2014. R.I.P. The American Statistician, v. 68, n. 3, p. 191–195, 2014.
WOLFRAM RESEARCH. Heavy Tail Distributions. Available from: https://reference.wolfram.com/language/guide/HeavyTailDistributions.html. Date accessed: 25 dec. 2020.
WOLFRAM RESEARCH. NormalDistribution. Available from: https://reference.wolfram.com/language/ref/NormalDistribution.html. Date accessed: 25 dec. 2020.
ZEMLIN, W. Speech and Hearing Science. Englewood Cliffs, N.J.: Prentice-Hall, 1968.