Efficient trade-offs as explanations in functional linguistics: some problems and an alternative proposal

The notion of efficient trade-offs is frequently used in functional linguistics in order to explain language use and structure. In this paper I argue that this notion is more confusing than enlightening. Not every negative correlation between parameters represents a real trade-off. Moreover, trade-offs are usually reported between pairs of variables, without taking into account the role of other factors. These and other theoretical issues are illustrated in a case study of linguistic cues used in expressing “who did what to whom”: case marking, rigid word order and medial verb position. The data are taken from the Universal Dependencies corpora in 30 languages and annotated corpora of online news from the Leipzig Corpora collection. We find that not all cues are correlated negatively, which questions the assumption of language as a zero-sum game. Moreover, the correlations between pairs of variables change when we incorporate the third variable. Finally, the relationships between the variables are not always bidirectional. The study also presents a causal model, which can serve as a more appropriate alternative to trade-offs.


REVISTA DA ABRALIN
If human languages are efficient, they should be located on a Pareto frontier. In other words, there should be a negative correlation between two linguistic variables, which is represented by the line in  Zipf's (1949) trade-off between Speaker and Addressee's efforts (see below). Alternatively, they can represent benefits, such as different types of information available to the hearer, as in the case of the trade-off between information conveyed by word-internal structure (morphology) and word order (KOPLENIG et al., 2017).
Trade-offs are closely related to competing motivations in language (DU BOIS, 1985). Language users and learners are driven by different communicative and cognitive pressures. For example, system pressure (analogy), which forces human language users to organize linguistic forms into systems, in which classes of forms behave similarly, can be in conflict with economic motivation (HASPELMATH, 2014). In particular, it would be less costly for articulation if English had a singulative form for "pea" (something like "pea-one") and have an unmarked plural form instead of "peas", like in Welsh, because we seldom speak about one pea only (Andersen's fairy tale The Princess and the Pea is a famous exception). The system pressure leads to a cognitively simpler system, which might be easier to acquire and manage in language production. The higher the articulatory costs, the lower are the cognitive costs, and the other way round.
Another example is competition between phonological transparency and articulatory efficiency.
Consider final devoicing of stems and affixes. For example, the noun kod "code" in Russian has the Genitive singular form kod-a ['koda], while the Nominative singular form is kod-Ø [kot], which sounds like kot "cat". This and other phonological alternations make articulation easier, but reduce the degree of transparency (i.e. one-to-one mapping between form and meaning) and consequently the degree of learnability of a language (HENGEVELD; LEUFKENS, 2018). As put informally by Joseph REVISTA DA ABRALIN Greenberg, "[a] speaker is like a lousy auto mechanic: every time [s]he fixes something in the language, [s]he screws up something else" (CROFT, 2002, p. 5).
At the same time, there are numerous problems associated with the concept of trade-off as an explanation in functional linguistics. These problems have been seldom discussed. Notable exceptions are Fenk-Oczlon and Fenk (2008) and Sinnemäki (2008;2014). 1 It is very tempting to interpret any negative correlation as an efficient trade-off. The present paper argues that such an interpretation is justified if and only if the following conditions are met: 1) the variables participating in the negative correlation can be clearly defined as costs or benefits; 2) there are only two correlated variables, and no other factors involved; 3) the correlated variables are functionally related, representing one type of linguistic task; 4) The relationships between the variables are bidirectional, not one-directional.
As will be shown in Section 1, these conditions are hardly ever met. Therefore, the concept of trade-off in linguistics brings more confusion than insights and should be dropped altogether. Instead, we should replace analysis of correlations between pairs of linguistic variables wit h causal analysis of multiple factors. These issues are illustrated in a case study of expression of core arguments in 30 languages (Section 2). Section 3 offers the conclusions and an outlook for future research.

Problems with trade-offs in functional linguistics
Trade-offs are assumed to exist between two types of costs or benefits. The aim of this section is to demonstrate that this assumption is often difficult to meet. Sometimes one linguistic variable involved in a presumed trade-off can represent different costs or benefits. Also, these costs and benefits are often difficult to define. The interpretation then becomes problematic.
One of the most popular trade-offs in the literature is the negative correlation between rigid word order and case morphology. Languages tend to use either explicit case marking (e.g. Latin or Lithuanian) or rigid word order (e.g. English or Mandarin Chinese). This correlation has been interpreted as a trade-off of different complexity types (SINNEMÄKI, 2014). The correlation is uncontroversial. What many studies of this correlation, however, often leave unclear is which costs for a language user are entailed by rigid or flexible word order, and if they can also offer any benefits (FENK-OCZLON;FENK 2008).
In research on linguistic complexity, it is believed that fixed word order in the domain of argument discrimination makes language more complex because it adds an extra constraint (e.g. SINNE-MÄKI, 2008). 2 At the same time, it can be argued that a language with some regularity and some freedom can be more difficult to acquire and process than either a language with random word order or a language with completely fixed word order. A similar operationalization of complexity is given in Gell-Mann (1995), according to whom effective complexity can be high only in the region between total order and complete disorder. So, it is not clear whether languages with rigid word order are necessarily more complex than flexible languages, since the latter usually have a bias towards a certain order, e.g. Subject followed by Object (LEVSHINA, 2019). They may also have additional rules, which require the non-dominant order (e.g. Object followed by Subject) in specific contexts. These rules will increase the complexity. On the other hand, completely rigid word order is rare, as well.
Word order flexibility is a gradient phenomenon, and we need a better understanding of how this gradience should be reflected by the metrics of linguistic complexity.
If we speak about the costs and benefits of word order variability for language users, rather than the abstract complexity of a linguistic system, the picture does not become much clearer. First of all, rigid word order has benefits for the addressee in the sense that it can be easier for assignment of syntactic roles to sentence elements (FENK-OCZLON;FENK 2008). Similarly, according to Hale's (2006) entropy reduction hypothesis, the difficulty in processing of a sentence depends on the number of bits conveyed by each following word. If word order is free, it may be more difficult to predict the next word, and the processing effort will be higher. Therefore, fixed order can be less costly, after all, if we take into account the addressee's interests.
At the same time, fixed word order has some side effects. In particular, it can be less optimal for management of information flow, e.g. by fronting the topic or putting backgrounded information in the very end of a sentence. If this variation is not allowed by grammar, language users will need to use additional markers in order to convey this pragmatic information, such as it-clefts, e.g. It is John who Mary loves. This creates additional articulation costs. Rigid word order also allows for fewer options in minimization of distances between dependent and head words, which can make sentences more costly, both for the speaker and the addressee, by increasing memory and integration costs.
To summarize, upon closer inspection, the famous trade-off between word order and morphology falls apart into a web of diverse interests of the speaker and the addressee. The interests of the language learner are yet another important aspect, which requires further research.
In lexicon, one can mention a trade-off between cognitive and communicative costs discussed by Kemp, Xu and Regier (2018). If a language has a large vocabulary with fine-grained distinctions in a particular domain, the cognitive costs of maintaining such a vocabulary are high. For example, detailed systems of kinship terms or colour terms are more costly than simple ones in that regard.
The communicative costs occur when the speaker does not deliver her message with enough precision. For example, when hearing the word "aunt", it is not clear whether the father's or the mother's sister is meant. Basically, these costs represent the risk of potential miscommunication. 3 Using computational modelling, Kemp et al. show that these two types of costs correlate negatively in real languages. There are systems with high cognitive costs, but low communicative costs (e.g. detailed kinship terms systems, as in Northern Paiute, an indigenous language of northern California) and systems with low cognitive costs and high communicative costs (kinship terms system with fewer distinctions, as in English). There are no systems in which both costs are high or both are low, so all languages are located close to a Pareto frontier.
This account leaves many questions open. Is a "simple" language less cognitively costly because it is easy to learn for L1 and L2 speakers? It can also be that users of a "simple" language spend less effort on extracting words from the long-term memory because the few words in the vocabulary are more easily accessible due to their high frequency, or because there is simply less competition between the words. Do communicative costs include articulation costs of using longer periphrastic expressions, such as "my father's sister" in a cognitively simple system? Which of these potential costs weigh more and which weigh less? A full-fledged efficiency account would require all these details.
From a mathematical perspective, a trade-off represents a negative correlation. In principle, every negative correlation can be regarded as a trade-off in a very abstract sense: if one quantity decreases, then the other increases, and the other way round. But if we want to appeal to the principle of efficiency, we should assume that a presumed trade-off is a result of rational choices made by language users. If the condition of free choice is not met, it is better to speak of a negative correlation, in order to avoid confusion.
From this follows that a trade-off can only be between functionally related linguistic variables which help to solve one and the same task, or hinder its accomplishment (SINNEMÄKI, 2008). An example is provided in Section 2, which discusses the cues that help us identify the subject and object of a sentence. Negative correlations between randomly selected linguistic variables, e.g. number of possible syllables in a language and level of inflectional synthesis (SHOSTED, 2006), are difficult to interpret as trade-offs.
Since trade-offs should involve rational choices, these choices should be available for both types of costs involved in a potential trade-off. To give a simple example, one can indulge in instant gratification, spending all money now on pleasant things and having nothing for tomorrow, or one can save money for a rainy day but have a less enjoyable life now. It is free choice in both directions.
Many correlations in the literature, however, do not fulfil this criterion. This means that they are not true trade-offs in the sense defined here.
Probably the most important negative correlation in communicative efficie ncy research is the one between context and amount of information encoded by the speaker in a message (ARIEL, 2014). Context can be defined as everything that belongs to the common ground shared by the speaker and the addressee (CLARK, 1996). Common ground includes preceding linguistic context, beliefs about the communities the interlocutors belong to, and information about the physical context and common past experience. There is ample evidence that common ground leads to shorter referential expressions used by interlocutors and in general shorter exchanges (e.g. CLARK; WILKES-GIBBS, 1986). Ariel's (1990) Accessibility Theory can be regarded as a correlation between context and coding length: there is a tendency for more accessible referents to be expressed by shorter forms (e.g. pronouns or zero expression) than less accessible ones, which are expressed by longer forms (e.g. noun phrases).
Zipf's law of abbreviation, which says that frequent words tend to be shorter than infrequent words (ZIPF, 1965(ZIPF, [1935), can also be interpreted as a negative correlation between coding length and ease of access due to high resting activation of frequent words. More recently, it has been shown by Piantadosi, Tily and Gibson (2011) that the correlations between ngram-based predictability and word length are stronger than those between frequency and length. In phonology, there is ample evidence that words and segments that are more predictable undergo phonetic reduction more frequently than less predictable units (JAEGER; BUZ, 2017). In grammar, this correlation can be found in markedness phenomena. Greenberg (1966a) was the first to show systematically that more frequent categories (e.g. singular and present tense) are expressed by unmarked forms, while the less frequent ones (e.g. plural and future tense) are expressed by marked forms. It has been explained by the tendency to provide less formal marking to more predictable categories (e.g. singular), and more marking to less predictable ones (e.g. plural) (HASPELMATH, 2008;2014). 4 Here one can also mention the efficient use of optional markers, e.g. complementizer "that" (JAEGER, 2010) and the Japanese Thus, there is convincing evidence of the negative correlation between amount of linguistic encoding and accessibility of information from context in a very broad sense. Can one call it an efficient trade-off? Not really. The reason is that the relationship is not free. The ease of access is determined by common ground or other factors. It is something given. A language user adjusts the amount of 4 Although some nouns may be more frequently used in the plural than in the singular, e.g. pea and peas (see Introduction), singular nouns are more frequent in general than plural. The split number marking of the Welsh type is unusual. Moreover, all languages with singulative coding also have ordinary plural marking for other nouns (HASPELMATH; KARJUS, 2017).
coding to the ease of access given in the situation, but cannot adjust the ease of access to the amount of coding they want to use. 5 In Section 1.1 we discussed the negative correlation between rigid word order and case morphology. In their large-scale study, Koplenig et al. (2017) speak about a general trade-off between information carried by word order and information carried by word-internal structure, measured with the help of information-theoretic concepts. The almost 1000 languages in their sample reveal a clear negative correlation. Isolating languages with high scores on information conveyed by word order, such as Mandarin Chinese, have low scores on information carried by word structure, while polysynthetic languages like Greenlandic Inuktitut or Ojibwa have low word order scores and high word structure scores. Koplenig et al. argue that this trade-off is efficient: If, for example, grammatical relationships in a sentence are fully determined by the ordering of words, it would constitute unnecessary cognitive effort to additionally encode this information with intralexical regularities. If, however, word ordering gives rise to some extent of grammatical ambiguity, we should expect this ambiguity to be cleared up with the help of word structure regularities in order to avoid unsuccessful transmission. (KOPLENIG et al., 2017, p. 4) From this follows that fixed word order triggers loss of morphological complexity. What explains the emergence of fixed word order is not clear. Therefore, this relationship seems to be unidirectional and cannot be regarded as a trade-off in the proper sense.
The trade-offs discussed in the literature are usually binary (but see FENK-OCZLON;FENK, 2008;SINNEMÄKI, 2008). However, there is always a chance that the relationship can change dramatically if other relevant factors are taken into account.
To illustrate this point, let us discuss Zipf's (1949) famous idea of two opposing forces: the Force of Unification and the Force of Diversification. The Force of Unification represents the speaker's economy: in the ideal case, the speaker only has one word that covers all meanings. There is no need to spend effort in order to choose between words (this is known as paradigmatic economy). The Force of Diversification represents the addressee's economy: there should be a specific word for each meaning that can be verbalized. A balance between these two forces leads to a compromise: human languages have a small convenient vocabulary of more general reference, and a large vocabulary of more precise reference. The famous Zipf's law (1949), which posits a negative correlation between the frequency of a word and its rank, is evidence for such a vocabulary balance. 5 To be more precise, Zipf's law of abbreviation seems to have a more complex explanation. A quantitative causal model by Baayen, Milin and Ramscar (2016) suggests that there is a causal relationship from co-textual predictability of a word and its length, and from its length to frequency. In other words, we choose shorter forms for predictable meanings, and these forms are then used more often because they are short.

REVISTA DA ABRALIN
Although Zipf's law is a well-established empirical fact, the trade-off between the speaker and addressee's interests is not unproblematic. In particular, Ariel (2014) argues that highly polysemous constructions, in which the meaning has to be inferred, have greater support from context (preceding discourse, non-linguistic information present in the common ground, etc.) than monosemous constructions. In fact, Piantadosi, Tily and Gibson (2012) argue that all efficient communication systems should be ambiguous, provided that there is sufficient context that can help to infer the meaning. This means that another trade-off comes into play, that is, the one between encoded information and common ground/accessibility, which was discussed in Section 1.2. Therefore, less encoding means in normal communication that the speaker considers the contextual cues to be sufficient for the addressee to understand the message. For example, a referent that has been recently introduced can be encoded by a shorter pronominal form or omitted altogether. The contextual cues help the addressee to infer the information, even if the verbal expression is ambiguous or vague, e.g. asking "Is there a bank near here?" after hearing that the store does not accept cards. Therefore, Zipf's proposal can only hold if we control for the amount of available context. Obviously, this is impossible to do in realistic settings. So, one may ask if Zipf's law is indeed explained by this trade-off between the Forces of Unification and Diversification. A more likely cause is the high accessibility of frequent forms, which can be easily extended to new contexts (HARMON; KAPATSINSKY, 2017).
Another problematic case is the negative correlation between memory costs and articulatory costs formulated by Martinet (1963, p. 165). For example, the verb "enlarge" is less accessible but more compact than a periphrastic expression "make bigger", which consists of more accessible elements but is longer. The claim that easily accessible periphrastic expressions have higher articulatory costs is not immediately convincing, however, because words that are easier to access are more frequent, and, as we know from Zipf's (1965Zipf's ( [1935) law of abbreviation, frequent words tend to be shorter and therefore easier to articulate. Unfortunately, the total length of the same message in formal and informal language is difficult to evaluate because we do not have parallel register -toregister corpora yet, so Martinet's claim remains a hypothesis.
Pareto efficiency means that different types of costs should be negatively correlated. However, in reality linguistic variables representing costs or benefits can be positively correlated, as well. For example, creole languages have low complexity across multiple domains (phonology, morphology and syntax), while 'old' languages have high complexity across the same domains (MCWHORTER, 2001). This means that domain-specific costs for language learners can be positively correlated, as well as articulatory costs for speakers, if we focus on obligatory grammatical marking, for example.
Moreover, different cues can even have a synergetic effect. For example, when expressing and interpreting some message, one modality of communication should be easier to process than several.

REVISTA DA ABRALIN
In spoken languages, a message is transmitted via two major modalities: auditory message and visual signals, which are produced by the head, face, hands, arms and torso. Some of these signals may be relevant or irrelevant, which means that we need extra effort to distinguish between them, especially under time constraints of spontaneous interaction with quick turn-taking. One would believe that processing one modality should be at the cost of the other. However, this is not what we see. There is evidence that interlocutors respond faster to questions that have an accompanying manual and/or head gesture, than to questions without such visual components (HOLLER; KENDRICK; LEVINSON, 2018). In fact, Holler and Levinson (2019) argue that multimodal information is easier to process than unimodal -that is, only visual or only auditory -information because visual bodily signals may reduce uncertainty at the message level. Humans are good at creating multimodal Gestalts as a result of message unification. As a result, different costs have a synergetic effect. Communication is therefore not Pareto-efficient.

A case study: different cues in expressing subject and object
This section investigates the relationships between different cues which can help to communicate "who did what to whom". One type of cues is formal markers, including case marking and agreement.
Another type is fixed word order, which can help to identify the thematic roles of the constituents (e.g. SAPIR, 1921). The position of the verb can be another cue. It is believed that it is easier to process the sentence and infer the roles when the verb is in the medial position between the subject and the object: [V]erb position is the particular vehicle which most conveniently enables these basic grammatical relations to be expressed by means of word order: the subject occurs to the immediate left, and the object to the immediate right of the verb. I.e. the verb acts as an anchor (HAWKINS, 1986, pp. 48-49) There is experimental evidence that users tend to avoid SOV in favour of SVO when describing reversible transitive events in pantomime, that is, those events where both participants can be subject or object, such as "The mother hugs the boy" and "The boy hugs the mother" (HALL; MAYBERRY; FERREIRA, 2013). This can be interpreted as evidence that verb-medial order indeed helps to identify the roles.
There is another reason why the position of the verb in the middle is beneficial for language processing. The sum distances from the head verb to the subject and object are the smallest when the verb is between subject and object (FERRER-I-CANCHO, 2017), which reduces the processing load.
Finally, we should not underestimate the role of semantics and encyclopaedic knowledge. In most situations, it is a dog that bites a man or a police officer who captures a thief, and not the other way round. This information can be important for the use of the cues. For example, there is a REVISTA DA ABRALIN correlation between the predictability of events and the use of overt object marking in Japanese (KURUMADA; JAEGER, 2015). Abstract referential features, such as animacy and identifiability, play an important role in differential marking, as in Spanish or Hebrew, and in probabilistic case marker use, as in Korean (LEE, 2009). There is a negative correlation between predictability and marking, which can be explained by efficiency considerations (JÄGER, 2007;LEVSHINA, 2018).
If the idea of efficient trade-offs is correct, we can expect negative correlations between all these cues (cf. SINNEMÄKI, 2008). Previous quantitative studies have shown a negative correlation between argument marking and rigid word order (SINNEMÄKI, 2014); as well as an association between zero argument marking and verb-medial order (SINNEMÄKI, 2010). The correlation between the final position of the verb and case marking is well known as Greenberg's (1966b) Universal 41: "If in a language the verb follows both the nominal subject and nominal object as the dominant order, the language almost always has a case system". However, the three parameters have never been investigated simultaneously. Also, for the first time, these parameters will be estimated from corpora, rather than from grammars, as in the previous studies. As will become clear, the parameters are gradient and should be treated as continuous variables. I will first present a series of pairwise correlations between these parameters. It will be shown that taking the third variable into account can change the picture significantly, which means that the idea of studying trade-offs between two variables only is very questionable. The correlational analyses will allow us to formulate a hypothesis about the relationships between all three cues, which will be tested in a causal analysis.
The language sample used for the present study includes thirty languages, which are listed in Table   1. The choice of languages was determined by the availability of sufficient data. Two sources were used: the Universal Dependencies (UD) corpora, version 2.6 (ZEMAN et al., 2020) 6 and online news corpora of 1 million sentences from the Leipzig Corpora Collection (GOLDHAHN; ECKART; QUASTHOFF, 2012) 7 . These two different collections were used in order to ensure that our results are not due to register bias, since the UD corpora represent very diverse types of texts. Also, some UD corpora are very small. As will be demonstrated, the correlations between the parameters based on each type of data are very high, which gives us confidence in the results.
In the online news corpora, each language is represented by one million sentences from online news (categories "news" and "newscrawl"). The corpora contain sentences in random order. The sentences were tokenized, lemmatized and morphologically and syntactically annotated with the help of the UD corpus tools in the R package udpipe (WIJFFELS, 2020). The language models, which 6 https://universaldependencies.org/ 7 https://wortschatz.uni-leipzig.de/en/download REVISTA DA ABRALIN were trained on the UD corpora, provide, among other things, universal parts-of-speech tags and dependency relations, which can be compared across different languages. This is crucial for the purposes of the present study.

REVISTA DA ABRALIN
Case marking was operationalized as distinctness of the forms representing transitive subject and object, following the token-based approach in Levshina (2019). The new method can give us more precise information about how frequently case markers can help language users to distinguish between the main participants. This matters for languages with differential and optional case marking.
For example, in Russian some nouns have different forms in the Nominative and Accusative (e.g.
devočk-a "girl-Nom" and devočk-u "girl-Acc"), while some nouns have identical forms (e.g. stol "table" or myš "mouse"). The question is, how frequently the forms are identical, and how frequently they are distinct. Similarly, some languages like Japanese and Korean have variable marking of subject and object with complex probabilistic rules. All this variability should be taken into account.
There is no reliable morphological annotation at the moment, which could be used to compare the forms in many different languages. The information about formal distinctness was approximated using the existing corpora in the following way. First, I extracted all nouns (wordforms in lower case and lemmas) with the universal syntactic dependency tags "nsubj" (nominal subject) and "obj" (object). In order to take into account languages like Spanish, where the object case marker a is a preposition, I also checked if the head noun had a syntactic dependency "case", and merged the case marker with the noun, e.g. a_mujer "woman.ACC". Only non-plural forms were considered in order to exclude the formal variation based on number. I do not expect this restriction to influence the results strongly because plural forms are less frequent than singular ones. For languages with articles written as one word with the nouns (Arabic, Bulgarian, Danish, Romanian and Swedish), subject and object forms were compared separately for definite and indefinite forms because it was too difficult to split them automatically. Indonesian possessive suffixes were not counted as part of wordforms.
Next, for every lemma used as both transitive subject and object in the corpus, the subject and object forms were listed. One form was selected randomly to represent a subject form, and one form to represent an object form, and these forms were compared. The total number of lemmas with distinct forms was computed for each language. This number was weighted by the lemma frequency, so that frequent lemmas had more weight than rare ones. Finally, the distinctiveness scores were divided by the total token frequency of all lemmas that were analyzed.
Following previous research (e.g. SINNEMÄKI, 2008) and the tradition in typology, the analyses presented below were performed on subjects and objects expressed by common nouns (Universal Part of Speech tag "NOUN"). However, I also computed scores for all possible subjects and objects There is a very strong correlation between the two types of data: r = 0.952, p < 0.001. It is not clear what explains the large discrepancies for Tamil, Lithuanian and Korean. Possible reasons can be the small size of the available UD corpora and the noise in the automatically parsed online news corpora.
Indexing of subject and object (agreement) is not investigated in this paper. Previous research has shown that subject agreement is not significantly correlated with word order or case marking, whereas object agreement correlates negatively with the presence of both factors simultaneously (SINNEMÄKI, 2008). Unfortunately, my sample of languages does not allow me to test object agreement statistically. I leave that to future research.

REVISTA DA ABRALIN
If the order of subject and object is fixed, it can be a reliable cue of the syntactic roles. In order to measure word order rigidity, I used anti-entropy, which is 1 minus Shannon entropy of the order of subject and object. Shannon's entropy has been used to represent flexibility in word order (LEVSHINA, 2019). The formula for computing entropy of orders SO and OS is as follows: where the probabilities of SO and OS were computed as simple proportions of each word order taken from the corpora.
The entropy score is minimal when either subject is always before object or the other way round,

REVISTA DA ABRALIN
The third variable was 'verb-medialness', which shows how frequently head verb occurs between subject and object. The procedure was as follows. I computed the number of all clauses (main and finite subordinate clauses) with overt subject and object ("nsubj" and "obj" relationships). Next, I computed the proportion of all clauses where the lexical verb is in the middle. The scores based on the UD corpora and the online news corpora are displayed in Figure 3. The correlation between the scores in the UD corpora and in the online news is nearly perfect: r = 0.992, p < 0.001. One can see a gap between strictly SOV languages (Japanese, Tamil

REVISTA DA ABRALIN
This section tests the relationships between the three types of cues. Recall that a trade-off requires a negative correlation between two parameters. Let us test if this requirement is met. Figure 5 displays Spearman's rank-based correlations between the pairs of variables. The results for both data sources are very similar.
The correlation between rigid word order and formal distinctness is negative: more rigid word order means less distinct subject and object forms (p < 0.001). It is also instructive to look at a scatter plot with language names in Figure 6, which shows this relationship in more detail. It tells us that languages with similar forms (the left-hand side of the corresponding small plot) indeed have rigid word order, but that languages with less similar forms are somewhat more variable with regard to word order rigidity. For example, Finnish, Japanese, Korean and Persian have highly distinct forms, but quite rigid word order, while Hungarian and Tamil also have distinct forms, but variable word order. This means that the trade-off is not perfectly symmetric, and the relationship is to some extent implicational, rather than fully correlational: Lack of formal distinctions strongly implies rigid word order, but rigid word order less strongly implies low formal distinctness, as shown by Finnish, Korean, Japanese and Persian.

REVISTA DA ABRALIN
The next correlation is between distinct forms and verb medialness. The correlation is again negative, as predicted (p < 0.001). Therefore, high formal distinctness should mean that the verb is less frequently in the middle, and low formal distinctness should mean that the verb is more frequently in the middle. However, the scatter plot shown in Figure 7 suggests again that this is a simplification. When the forms are not distinct, the verb is typically between subject and object, as the large cluster of languages in the bottom right corner shows. Yet, when the forms are distinct, the verb can be anywhere. For example, it is rarely medial in Turkish, Hindi, Japanese, Korean and Tamil (see top left corner), but usually medial in the Baltic and Finnic languages (see top right corner). This relationship is even more obviously implicational than in the previous plot.

REVISTA DA ABRALIN
Finally, we observe a positive correlation between rigid word order and verb-medialness. This finding is similar to the results reported by Sinnemäki (2010), who used categorical data from a large sample of typologically diverse languages. The positive correlation is a case of cue redundancy. The distribution of the scores is shown in Figure 8. We can see that very rigid word order in French, Indonesian or English is strongly associated with verb-medial position, but the verb-final languages on the left behave in very diverse ways.
So far, we have discussed pairwise correlations that did not take into account the presence of the third variable. However, this analysis is incomplete because when testing the correlation b etween two types of cues, we need to control for the third one. In order to do so, one can use partial correlation coefficients. They are shown in Table 2. The coefficients for the UD corpora and the online news corpora are similar, which means that our results are robust. The numbers demonstrate that the correlation between formal distinctness and rigid word order is the strongest one, followed by the negative correlation between formal distinctness and verb-medialness. This is similar to the previous results. The correlations are now REVISTA DA ABRALIN weaker, however. The most striking difference is that the correlation between rigid word order and verb-medialness disappears when we take into account formal distinctness.
One may object that the data are dependent because many of the languages come from the same families and genera (that is, Baltic, Germanic, Romance, Slavic and Finnic). If we take into account these dependencies, traditional correlational analysis is not appropriate any more. Additional tests (LEVSHINA, In preparation) based on permutation and resampling support the quantitative results presented here.
The quantitative analyses have revealed a negative correlation between rigid word order and distinct forms of subject and object. We also found a negative correlation between distinct forms and medial position of the verb. Rigid word order and verb-medialness are correlated positively, but this correlation disappears when the formal distinctness is taken into account. This supports the idea of Fenk-Oczlon and Fenk (2008) that trade-offs are more likely to be observed between different linguistic domains (e.g. syntax and morphology, or semantics and phonology) than within the same domain (see also SINNEMÄKI, 2008).
We also saw in the scatter plots that languages lacking formal distinctness have rigid word order, and tend to have verb in the middle. So, one might think that lack of formal distinctness causes language users to provide cues with the help of word order. If one changes the perspective, it is also possible to say that the languages with rigid word order have low formal distinctness, whereas SOV languages tend to have high distinctness, so one could claim that it is word order that can explain case marking. So, what is the direction of causality -from word order to case marking, or the other way round?
There are some arguments in the literature that word order can determine case marking. According to Kiparsky (1996), the shift to VO began in Old English before the collapse of the case system (and also before the loss of subject-verb agreement). Similarly, Bauer (2009) shows that the change to VO and rigid word order in Late and Vulgar Latin was before the loss of inflection in Romance.
There is a hypothesis that Indo-European languages drift from SOV to SVO and rigid word order, which leads to the loss of inflections (KOCH, 1974). Since most of the languages in our sample are Indo-European, this may be an explanation of the correlations we observe.
There is also experimental evidence of a causal link from word order to case marking. In a study by Fedzechkina, Newport and Jaeger (2016), learners were presented with miniature artificial languages containing optional case marking and either fixed or flexible constituent order. It was found that the learners of the fixed order language used case marking significantly less often than the other learners, and less often than in the input language, which means that rigid word order indeed REVISTA DA ABRALIN triggers the loss of distinct forms. At the same time, the word order properties of the input languages remained stable.
In order to test this hypothesis, we should move from binary correlations to multivariate causal analysis (BLASI; ROBERTS, 2017). A causal analysis using the PC algorithm (SPIRTES; GLYMOUR; SCHEINES, 2000;KALISH et al., 2012) produces the directed acyclic graph shown in Figure 9. The arrows represent the direction of effect of one variable on another, with the significance level of 0.05.
The results for the UD corpora and the online news data are identical. Similar results are obtained with the help of a resampling method, where one draws one language per genus 1,000 times, logging the probability of every link, and computes the average probability (LEVSHINA, In preparation).
The graph tells us that both word order variables contribute jointly to the distinctness of subject and object forms. The word order variables are not causally related on their own. This is in line with the results of the partial correlational analysis. Both word order variables have an effect on formal distinctness. This supports the theoretical claims from the literature discussed above. A new finding is that the verb position also affects formal distinctness. In particular, we can hypothesize that verbfinalness increases the distinctness of subject and object.

Discussion
This paper has discussed a popular idea in functional linguistics, namely, that different costs or benefits are in relationships of efficient trade-offs, which can be thought of as Pareto frontiers. I argued REVISTA DA ABRALIN that there are many conceptual and methodological problems with that idea. First, it is difficult to identify the exact nature of costs and benefits. Second, a negative correlation between costs or benefits does not always mean that the language user can make a rational choice. Third, binary tradeoffs ignore other relevant costs and benefits. Therefore, it would be safer to drop the term "tradeoff" altogether.
In game theory and economics, the situation of Pareto efficiency is also known as a zero-sum game, where the interacting parties' aggregate gains and losses add up to zero. It has been argued, however, that there is an increasing chance of finding non-zero-sum solutions as the complexity of a system increases (WRIGHT, 2000). Language as a highly complex system is not a zero-sum game.
As an illustration, I presented a case study of three types of cues that help to differentiate between subject and object: rigid word order, medial position of the verb and formal distinctness of the arguments provided by case morphemes and adpositions. The results of correlational analyses demonstrate that not all cues are efficiently related. There can be redundancy in the amount of information available to the addressee. Also, we have seen that some relationships are more implicational than correlational, which also leads to cue redundancy. The only thing disfavoured by the languages is the absence of any cues. It seems that a breakdown of communication (with additional costs of reanalysis and conversational repair) is more dangerous than wasting the resources. This conclusion is in line with typological evidence, which suggests that all languages have some amount of redundancy (HENGEVELD; LEUFKENS, 2018).
Taking the speaker's perspective, we can say that the speaker saves effort by providing less overt coding when the word order provides sufficient information. This is efficient behaviour, but it is difficult to treat it as a real trade-off because, unlike the articulatory efforts required for production of case marking, it is not clear what kind of costs word order has for the speaker (see also the discussion in Section 1.1). Also, the existence of languages with case marking but fairly rigid and verbmedial word order suggest that the speaker's behaviour is not always efficient.
At the moment, we do not know what the costs of acquiring more or less flexible word order are for learners. I leave the question of trade-offs in language acquisition open.
Finally, I argued that bivariate correlations should be replaced with multivariate causal analysis and showed how this can be done for the three types of cues. This study has demonstrated that word order determines case marking, but not the other way round. It seems that fixed word order allows case marking to disappear. Also, it may be that verb-final languages tend to develop and maintain case forms. These causal hypotheses are preliminary and need to be further investigated on a larger sample without the Indo-European bias. Other linguistic and extralinguistic factors, such as agreement, semantics, population size and the presence of intensive language contact, should also be taken into account.
It is easy to understand why the idea of a trade-off is appealing: it is very simple and intuitive. If you take a larger slice of a cake, the others will get less. In fact, people have a bias towards zero-sum thinking, which persists on a personal level and as a cultural worldview ideology (RÓŻYCKA -TRAN; BOSKI; WOICISZKE, 2015). The zero-sum thinking makes people choose win-lose strategies instead of trying to find win-win solutions -a tendency that has become probably too obvious in the world politics nowadays. Our task as scientists is to prevent people from falling into this cognitive trap, and, of course, not to commit this mistake ourselves.