Different statistical analyses may lead to categorically distinct conclusions

Ronaldo Mangueira Lima Jr,
Guilherme Duarte Garcia

Abstract

In this study, we illustrate the potential variability of statistical significance by comparing four different methods, namely, t-test, ANOVA (followed by Tukey HSD), simple linear regression, and mixed effects linear regression. In our demonstration, we model reaction times as a function of different affixes in Danish, and show how our conclusions regarding the effect of certain affixes can change categorically depending on which of the aforementioned methods we choose to use. Finally, we echo recent studies (e.g., BARR et al., 2013), and suggest that mixed effects models be the norm whenever grouped data is analyzed. With our comparison, we hope to raise researchers’ awareness to the need for well-informed and ethical analytical decisions in linguistic studies.

Full-text of the article is available for this locale: Português (Brasil).

References

BAAYEN, Rolf Harald. languageR: v 1.0, 2007a.
BAAYEN, Rolf Harald. Analyzing Linguistic Data: A practical introduction to statistics using R, Cambridge: Cambridge University Press, 2007b.
BAAYEN, Rolf Harald; DAVIDSON, Doug; BATES, Douglas. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, v. 59, n. 4, p. 390–412, 2008. https://doi.org/10.1016/j.jml.2007.12.005
BALLING, Laura Winther; BAAYEN, Rolf Harald. Morphological effects in auditory word recognition: Evidence from Danish. Language and Cognitive Processes, v. 23, n. 7–8, p. 1159–1190, 2008. https://doi.org/10.1080/01690960802201010
BERGER, James O.; SELLKE, Thomas. Testing a point null hypothesis: The irreconcilability of p values and evidence. Journal of the American statistical Association, v. 82, n. 397, p. 112–122, 1987. https://doi.org/10.2307/2289138
CHAMBERS, John, M. S, R, and Data Science. The R Journal, v. 12, n. 1, p. 462-476, 2020. DOI 10.32614/RJ-2020-028 Acesso em 24 novembro 2020.
COHEN, Jacob. The earth is round (p < .05). American Psychologist, v. 9, n.12, p. 997–1003, 1994. https://doi.org/10.1037/0003-066X.49.12.997
GARCIA, Guilherme Duarte. Introduction to data analysis using R, 2019. Disponível em https://guilhermegarcia.github.io/rWorkshop/garcia_rWorkshop_complete.html.
GARCIA, Guilherme D. Data visualization and analysis in second language research. NY: Routledge, 2021. No prelo.
GODOY, Mahayana Cristina. Introdução aos modelos lineares mistos para os estudos da linguagem. PsyArXiv, 2019. https://doi.org/10.17605/OSF.IO/9T8UR
GODOY, Mahayana C.; NUNES, Marcus A. Uma comparação entre ANOVA e modelos lineares mistos para análise de dados de tempo de resposta. Revista da ABRALIN, v. 19, n. 1, pp. 1–23, 17 jul. 2020. https://doi.org/10.25189/rabralin.v19i1.1388
GRIES, Stefan Th. Statistics for linguistics with R: A practical introduction. Berlin: Walter de Gruyter, 2013.
HALSEY, Lewis G.; CURRAN-EVERETT, Douglas; VOWLER, Sarah L.; DRUMMOND, Gordon B. The fickle P value generates irreproducible results. Nature methods, v. 12, n. 3, p. 179–185, 2015. https://doi.org/10.1038/nmeth.3288
HASSEMER, Julius; WINTER, Bodo. Producing and perceiving gestures conveying height or shape. Gesture, v. 15, n. 3, pp. 404–424, 2016. https://doi.org/10.1075/gest.15.3.07has
JOHNSON, Douglas H. The insignificance of statistical significance testing. The Journal of Wildlife Management, p. 763–772, 1999. https://doi.org/10.2307/3802789
KRUSCHKE, John K. Doing Bayesian data analysis: a tutorial with R, JAGS, and Stan, 2a edição. Elsevier, 2015.
LARSON-HALL, Jenifer. A guide to doing statistics in second language research using SPSS and R. Routledge, 2015.
LEVSHINA, Natalia. How to do linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins Publishing Company, 2015.
LIMA JR, Ronaldo Mangueira; GARCIA, Guilherme Duarte; ANGELE, Bernhard. Introdução a modelos de regressão para linguistas no R, 2020. Disponível em https://guilhermegarcia.github.io/rling.html
LOERTS, Hanneke; LOWIE, Wander; SETON, Bregtje. Essential Statistics for Applied Linguistics: Using R Or JASP. Amsterdam: Macmillan International, Red Globe Press, 2020.
LOFTUS, Geoffrey R. A picture is worth a thousand p values: On the irrelevance of hypothesis testing in the microcomputer age. Behavior Research Methods, Instruments, & Computers, v. 25, n. 2, p. 250–256, 1993. https://doi.org/10.3758/BF03204506
NUZZO, Regina. Scientific method: statistical errors. Nature News, v. 506, n. 7487, p. 150, 2014.
OUSHIRO, Livia. Introdução à Estatística para Linguistas, v.1.0.1 (dez/2017), 2017. Disponível em DOI http://rpubs.com/oushiro/iel.
R CORE TEAM. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Acesso http://www.R-project.org/ Acesso em: 15 jun. 2020.
SONDEREGGER, Morgan; WAGNER, Michael; TORREIRA, Francisco. Quantitative Methods for Linguistic Data. v. 1.0 (out/2018), 2018. Disponível em http://people.linguistics.mcgill.ca/~morgan/book/
WAGENMAKERS, Eric-Jan. A practical solution to the pervasive problems of p values. Psychonomic bulletin & review, v. 14, n. 5, p. 779-804, 2007. https://doi.org/10.3758/BF03194105
WINTER, Bodo. Statistics for linguists: An introduction using R. Routledge, 2019.