Efficient trade-offs as explanations in functional linguistics: some problems and an alternative proposal

Natalia Levshina

Resumo

The notion of efficient trade-offs is frequently used in functional linguistics in order to explain language use and structure. In this paper I argue that this notion is more confusing than enlightening. Not every negative correlation between parameters represents a real trade-off. Moreover, trade-offs are usually reported between pairs of variables, without taking into account the role of other factors. These and other theoretical issues are illustrated in a case study of linguistic cues used in expressing “who did what to whom”: case marking, rigid word order and medial verb position. The data are taken from the Universal Dependencies corpora in 30 languages and annotated corpora of online news from the Leipzig Corpora collection. We find that not all cues are correlated negatively, which questions the assumption of language as a zero-sum game. Moreover, the correlations between pairs of variables change when we incorporate the third variable. Finally, the relationships between the variables are not always bidirectional. The study also presents a causal model, which can serve as a more appropriate alternative to trade-offs.

Aims of this paper

Efficiency can be defined as minimization of a ratio of costs to benefits. To put it simply, a person behaves efficiently when they do not spend more effort than necessary in order to achieve their goals. Speaking about language, the costs can be related to language processing, articulation and acquisition, while the main type of benefits is the realization of one’s communicative needs. Although one can also think of aesthetic, social and other benefits, those are less frequently discussed in the literature.

The fundamental question of functional linguistics is why human languages are as they are. There is a widely held view that one of the driving forces of language change is efficient choices made by language users during interaction. These choices can become conventional, according to the “invisible hand” principle (KELLER, 1994[1]).

The idea that language users try to behave efficiently has a long history. Already Georg Curtius (1820–1885), a German philologist, explained phonetic attrition (Verwitterung “weathering”) by the language users’ drive to Bequemlichkeit “comfort”. This drive is counterbalanced by the tendency to preserve meaning-bearing sounds and syllables, which resist attrition in order to remain recognizable (DELBRÜCK, 1908, p. 143-144[2]). Therefore, language users tend to minimize their effort, at the same time trying to make sure that the important meanings are conveyed. Throughout the 20th century, the idea that language users try to save effort was a recurrent topic in linguistics, from Zipf’s (1949[3]) principle of least effort to Haiman’s (1983[4]) economic motivation in grammar and Keller’s maxim “Talk in such a way that you do not spend more energy than you need to attain your goal” (1994, p. 107[1]). In the 21st century, these ideas have been made more concrete and tested with the help of diverse data sources and cutting-edge methods, including multilingual corpora, artificial language learning experiments, multivariate statistical models and approaches from information theory (GIBSON et al., 2019[5]).

Efficiency can explain the form and use of diverse grammatical constructions, words and phonological units. One can mention Zipf’s (1965[1935][6]) law of abbreviation in lexicon (see Section 1.2 for more detail), minimization of distances between syntactically and semantically related words, which makes processing easier (e.g. GIBSON, 2000[7]; FERRER-I-CANCHO, 2006[8]), efficient phonetic reduction in language production (JAEGER; BUZ, 2017[9]), and efficient use of referential expressions in discourse (CLARK; WILKES-GIBBS, 1986[10]; ARIEL, 1990[11]). More examples can be found in Hawkins (2004[12]), Jaeger and Tily (2011[13]), Levshina (2018[14]) and Gibson et al. (2019[5]).

We can speak of a trade-off when spending the limited resources on gaining in one aspect leads to losing in another aspect. For example, there can be an implicit assumption in the media during the coronavirus pandemic that keeping the economy going can only be done at the costs of public safety. Another trade-off is between protecting the environment and ensuring the high standard of living in the industrialized countries. In linguistics, there is a view that languages which are simple in one respect are likely to be complex in others (cf. SHOSTED, 2006[15]). These real or perceived trade-offs play an important role in the way we understand the world.

A trade-off can be represented visually as shown in Figure 1. The axes represent two potential costs. The dots are observations from some imaginary data. The line corresponds to the so-called Pareto frontier. The observations lying close to the Pareto frontier are optimal (or Pareto-efficient) because it is impossible to minimize one cost without increasing the other.

Figure 1. A Pareto frontier based on imaginary data with two different costs

If human languages are efficient, they should be located on a Pareto frontier. In other words, there should be a negative correlation between two linguistic variables, which is represented by the line in Figure 1. The variables can represent different types of costs. An example is Zipf’s (1949[3]) trade-off between Speaker and Addressee’s efforts (see below). Alternatively, they can represent benefits, such as different types of information available to the hearer, as in the case of the trade-off between information conveyed by word-internal structure (morphology) and word order (KOPLENIG et al., 2017[16]).

Trade-offs are closely related to competing motivations in language (DU BOIS, 1985[17]). Language users and learners are driven by different communicative and cognitive pressures. For example, system pressure (analogy), which forces human language users to organize linguistic forms into systems, in which classes of forms behave similarly, can be in conflict with economic motivation (HASPELMATH, 2014[18]). In particular, it would be less costly for articulation if English had a singulative form for “pea” (something like “pea-one”) and have an unmarked plural form instead of “peas”, like in Welsh, because we seldom speak about one pea only (Andersen’s fairy tale The Princess and the Pea is a famous exception). The system pressure leads to a cognitively simpler system, which might be easier to acquire and manage in language production. The higher the articulatory costs, the lower are the cognitive costs, and the other way round.

Another example is competition between phonological transparency and articulatory efficiency. Consider final devoicing of stems and affixes. For example, the noun kod “code” in Russian has the Genitive singular form kod-a ['koda], while the Nominative singular form is kod-Ø [kot], which sounds like kot “cat”. This and other phonological alternations make articulation easier, but reduce the degree of transparency (i.e. one-to-one mapping between form and meaning) and consequently the degree of learnability of a language (HENGEVELD; LEUFKENS, 2018[19]). As put informally by Joseph Greenberg, “[a] speaker is like a lousy auto mechanic: every time [s]he fixes something in the language, [s]he screws up something else” (CROFT, 2002, p. 5[20]).

At the same time, there are numerous problems associated with the concept of trade-off as an explanation in functional linguistics. These problems have been seldom discussed. Notable exceptions are Fenk-Oczlon and Fenk (2008[21]) and Sinnemäki (2008[22]; 2014[23]).1 It is very tempting to interpret any negative correlation as an efficient trade-off. The present paper argues that such an interpretation is justified if and only if the following conditions are met:

1) the variables participating in the negative correlation can be clearly defined as costs or benefits;

2) there are only two correlated variables, and no other factors involved;

3) the correlated variables are functionally related, representing one type of linguistic task;

4) The relationships between the variables are bidirectional, not one-directional.

As will be shown in Section 1, these conditions are hardly ever met. Therefore, the concept of trade-off in linguistics brings more confusion than insights and should be dropped altogether. Instead, we should replace analysis of correlations between pairs of linguistic variables with causal analysis of multiple factors. These issues are illustrated in a case study of expression of core arguments in 30 languages (Section 2). Section 3 offers the conclusions and an outlook for future research.

1. Problems with trade-offs in functional linguistics

1.1. The problems with defining costs and benefits

Trade-offs are assumed to exist between two types of costs or benefits. The aim of this section is to demonstrate that this assumption is often difficult to meet. Sometimes one linguistic variable involved in a presumed trade-off can represent different costs or benefits. Also, these costs and benefits are often difficult to define. The interpretation then becomes problematic.

One of the most popular trade-offs in the literature is the negative correlation between rigid word order and case morphology. Languages tend to use either explicit case marking (e.g. Latin or Lithuanian) or rigid word order (e.g. English or Mandarin Chinese). This correlation has been interpreted as a trade-off of different complexity types (SINNEMÄKI, 2014[23]). The correlation is uncontroversial. What many studies of this correlation, however, often leave unclear is which costs for a language user are entailed by rigid or flexible word order, and if they can also offer any benefits (FENK-OCZLON; FENK 2008[21]).

In research on linguistic complexity, it is believed that fixed word order in the domain of argument discrimination makes language more complex because it adds an extra constraint (e.g. SINNEMÄKI, 2008[22]).2 At the same time, it can be argued that a language with some regularity and some freedom can be more difficult to acquire and process than either a language with random word order or a language with completely fixed word order. A similar operationalization of complexity is given in Gell-Mann (1995[24]), according to whom effective complexity can be high only in the region between total order and complete disorder. So, it is not clear whether languages with rigid word order are necessarily more complex than flexible languages, since the latter usually have a bias towards a certain order, e.g. Subject followed by Object (LEVSHINA, 2019[25]). They may also have additional rules, which require the non-dominant order (e.g. Object followed by Subject) in specific contexts. These rules will increase the complexity. On the other hand, completely rigid word order is rare, as well. Word order flexibility is a gradient phenomenon, and we need a better understanding of how this gradience should be reflected by the metrics of linguistic complexity.

If we speak about the costs and benefits of word order variability for language users, rather than the abstract complexity of a linguistic system, the picture does not become much clearer. First of all, rigid word order has benefits for the addressee in the sense that it can be easier for assignment of syntactic roles to sentence elements (FENK-OCZLON; FENK 2008[21]). Similarly, according to Hale’s (2006[26]) entropy reduction hypothesis, the difficulty in processing of a sentence depends on the number of bits conveyed by each following word. If word order is free, it may be more difficult to predict the next word, and the processing effort will be higher. Therefore, fixed order can be less costly, after all, if we take into account the addressee’s interests.

At the same time, fixed word order has some side effects. In particular, it can be less optimal for management of information flow, e.g. by fronting the topic or putting backgrounded information in the very end of a sentence. If this variation is not allowed by grammar, language users will need to use additional markers in order to convey this pragmatic information, such as it-clefts, e.g. It is John who Mary loves. This creates additional articulation costs. Rigid word order also allows for fewer options in minimization of distances between dependent and head words, which can make sentences more costly, both for the speaker and the addressee, by increasing memory and integration costs.

To summarize, upon closer inspection, the famous trade-off between word order and morphology falls apart into a web of diverse interests of the speaker and the addressee. The interests of the language learner are yet another important aspect, which requires further research.

In lexicon, one can mention a trade-off between cognitive and communicative costs discussed by Kemp, Xu and Regier (2018[27]). If a language has a large vocabulary with fine-grained distinctions in a particular domain, the cognitive costs of maintaining such a vocabulary are high. For example, detailed systems of kinship terms or colour terms are more costly than simple ones in that regard. The communicative costs occur when the speaker does not deliver her message with enough precision. For example, when hearing the word “aunt”, it is not clear whether the father’s or the mother’s sister is meant. Basically, these costs represent the risk of potential miscommunication.3 Using computational modelling, Kemp et al. show that these two types of costs correlate negatively in real languages. There are systems with high cognitive costs, but low communicative costs (e.g. detailed kinship terms systems, as in Northern Paiute, an indigenous language of northern California) and systems with low cognitive costs and high communicative costs (kinship terms system with fewer distinctions, as in English). There are no systems in which both costs are high or both are low, so all languages are located close to a Pareto frontier.

This account leaves many questions open. Is a “simple” language less cognitively costly because it is easy to learn for L1 and L2 speakers? It can also be that users of a “simple” language spend less effort on extracting words from the long-term memory because the few words in the vocabulary are more easily accessible due to their high frequency, or because there is simply less competition between the words. Do communicative costs include articulation costs of using longer periphrastic expressions, such as “my father’s sister” in a cognitively simple system? Which of these potential costs weigh more and which weigh less? A full-fledged efficiency account would require all these details.

1.2. The problems of similar functions and rational choice

From a mathematical perspective, a trade-off represents a negative correlation. In principle, every negative correlation can be regarded as a trade-off in a very abstract sense: if one quantity decreases, then the other increases, and the other way round. But if we want to appeal to the principle of efficiency, we should assume that a presumed trade-off is a result of rational choices made by language users. If the condition of free choice is not met, it is better to speak of a negative correlation, in order to avoid confusion.

From this follows that a trade-off can only be between functionally related linguistic variables which help to solve one and the same task, or hinder its accomplishment (SINNEMÄKI, 2008[22]). An example is provided in Section 2, which discusses the cues that help us identify the subject and object of a sentence. Negative correlations between randomly selected linguistic variables, e.g. number of possible syllables in a language and level of inflectional synthesis (SHOSTED, 2006[15]), are difficult to interpret as trade-offs.

Since trade-offs should involve rational choices, these choices should be available for both types of costs involved in a potential trade-off. To give a simple example, one can indulge in instant gratification, spending all money now on pleasant things and having nothing for tomorrow, or one can save money for a rainy day but have a less enjoyable life now. It is free choice in both directions. Many correlations in the literature, however, do not fulfil this criterion. This means that they are not true trade-offs in the sense defined here.

Probably the most important negative correlation in communicative efficiency research is the one between context and amount of information encoded by the speaker in a message (ARIEL, 2014). Context can be defined as everything that belongs to the common ground shared by the speaker and the addressee (CLARK, 1996[28]). Common ground includes preceding linguistic context, beliefs about the communities the interlocutors belong to, and information about the physical context and common past experience. There is ample evidence that common ground leads to shorter referential expressions used by interlocutors and in general shorter exchanges (e.g. CLARK; WILKES-GIBBS, 1986[10]). Ariel’s (1990[11]) Accessibility Theory can be regarded as a correlation between context and coding length: there is a tendency for more accessible referents to be expressed by shorter forms (e.g. pronouns or zero expression) than less accessible ones, which are expressed by longer forms (e.g. noun phrases).

Zipf’s law of abbreviation, which says that frequent words tend to be shorter than infrequent words (ZIPF, 1965[1935][6]), can also be interpreted as a negative correlation between coding length and ease of access due to high resting activation of frequent words. More recently, it has been shown by Piantadosi, Tily and Gibson (2011[29]) that the correlations between ngram-based predictability and word length are stronger than those between frequency and length. In phonology, there is ample evidence that words and segments that are more predictable undergo phonetic reduction more frequently than less predictable units (JAEGER; BUZ, 2017[9]). In grammar, this correlation can be found in markedness phenomena. Greenberg (1966a[30]) was the first to show systematically that more frequent categories (e.g. singular and present tense) are expressed by unmarked forms, while the less frequent ones (e.g. plural and future tense) are expressed by marked forms. It has been explained by the tendency to provide less formal marking to more predictable categories (e.g. singular), and more marking to less predictable ones (e.g. plural) (HASPELMATH, 2008[31]; 2014[18]).4 Here one can also mention the efficient use of optional markers, e.g. complementizer “that” (JAEGER, 2010[32]) and the Japanese object marker -o (KURUMADA; JAEGER, 2015[33]). The markers are used more frequently in the situations where the grammatical role of the marked element is less predictable based on world knowledge or linguistic experience.

Thus, there is convincing evidence of the negative correlation between amount of linguistic encoding and accessibility of information from context in a very broad sense. Can one call it an efficient trade-off? Not really. The reason is that the relationship is not free. The ease of access is determined by common ground or other factors. It is something given. A language user adjusts the amount of coding to the ease of access given in the situation, but cannot adjust the ease of access to the amount of coding they want to use.5

In Section 1.1 we discussed the negative correlation between rigid word order and case morphology. In their large-scale study, Koplenig et al. (2017[16]) speak about a general trade-off between information carried by word order and information carried by word-internal structure, measured with the help of information-theoretic concepts. The almost 1000 languages in their sample reveal a clear negative correlation. Isolating languages with high scores on information conveyed by word order, such as Mandarin Chinese, have low scores on information carried by word structure, while polysynthetic languages like Greenlandic Inuktitut or Ojibwa have low word order scores and high word structure scores. Koplenig et al. argue that this trade-off is efficient:

If, for example, grammatical relationships in a sentence are fully determined by the ordering of words, it would constitute unnecessary cognitive effort to additionally encode this information with intra-lexical regularities. If, however, word ordering gives rise to some extent of grammatical ambiguity, we should expect this ambiguity to be cleared up with the help of word structure regularities in order to avoid unsuccessful transmission. (KOPLENIG et al., 2017, p. 4[16])

From this follows that fixed word order triggers loss of morphological complexity. What explains the emergence of fixed word order is not clear. Therefore, this relationship seems to be unidirectional and cannot be regarded as a trade-off in the proper sense.

1.3. The problem of multiple factors

The trade-offs discussed in the literature are usually binary (but see FENK-OCZLON; FENK, 2008[21]; SINNEMÄKI, 2008[22]). However, there is always a chance that the relationship can change dramatically if other relevant factors are taken into account.

To illustrate this point, let us discuss Zipf’s (1949[3]) famous idea of two opposing forces: the Force of Unification and the Force of Diversification. The Force of Unification represents the speaker’s economy: in the ideal case, the speaker only has one word that covers all meanings. There is no need to spend effort in order to choose between words (this is known as paradigmatic economy). The Force of Diversification represents the addressee’s economy: there should be a specific word for each meaning that can be verbalized. A balance between these two forces leads to a compromise: human languages have a small convenient vocabulary of more general reference, and a large vocabulary of more precise reference. The famous Zipf’s law (1949[3]), which posits a negative correlation between the frequency of a word and its rank, is evidence for such a vocabulary balance.

Although Zipf’s law is a well-established empirical fact, the trade-off between the speaker and addressee’s interests is not unproblematic. In particular, Ariel (2014[34]) argues that highly polysemous constructions, in which the meaning has to be inferred, have greater support from context (preceding discourse, non-linguistic information present in the common ground, etc.) than monosemous constructions. In fact, Piantadosi, Tily and Gibson (2012[35]) argue that all efficient communication systems should be ambiguous, provided that there is sufficient context that can help to infer the meaning. This means that another trade-off comes into play, that is, the one between encoded information and common ground/accessibility, which was discussed in Section 1.2. Therefore, less encoding means in normal communication that the speaker considers the contextual cues to be sufficient for the addressee to understand the message. For example, a referent that has been recently introduced can be encoded by a shorter pronominal form or omitted altogether. The contextual cues help the addressee to infer the information, even if the verbal expression is ambiguous or vague, e.g. asking “Is there a bank near here?” after hearing that the store does not accept cards. Therefore, Zipf’s proposal can only hold if we control for the amount of available context. Obviously, this is impossible to do in realistic settings. So, one may ask if Zipf’s law is indeed explained by this trade-off between the Forces of Unification and Diversification. A more likely cause is the high accessibility of frequent forms, which can be easily extended to new contexts (HARMON; KAPATSINSKY, 2017[36]).

Another problematic case is the negative correlation between memory costs and articulatory costs formulated by Martinet (1963, p. 165[37]). For example, the verb “enlarge” is less accessible but more compact than a periphrastic expression “make bigger”, which consists of more accessible elements but is longer. The claim that easily accessible periphrastic expressions have higher articulatory costs is not immediately convincing, however, because words that are easier to access are more frequent, and, as we know from Zipf’s (1965[1935][6]) law of abbreviation, frequent words tend to be shorter and therefore easier to articulate. Unfortunately, the total length of the same message in formal and informal language is difficult to evaluate because we do not have parallel register-to-register corpora yet, so Martinet’s claim remains a hypothesis.

1.4. Positive correlations and synergy instead of competition

Pareto efficiency means that different types of costs should be negatively correlated. However, in reality linguistic variables representing costs or benefits can be positively correlated, as well. For example, creole languages have low complexity across multiple domains (phonology, morphology and syntax), while ‘old’ languages have high complexity across the same domains (MCWHORTER, 2001[38]). This means that domain-specific costs for language learners can be positively correlated, as well as articulatory costs for speakers, if we focus on obligatory grammatical marking, for example.

Moreover, different cues can even have a synergetic effect. For example, when expressing and interpreting some message, one modality of communication should be easier to process than several. In spoken languages, a message is transmitted via two major modalities: auditory message and visual signals, which are produced by the head, face, hands, arms and torso. Some of these signals may be relevant or irrelevant, which means that we need extra effort to distinguish between them, especially under time constraints of spontaneous interaction with quick turn-taking. One would believe that processing one modality should be at the cost of the other. However, this is not what we see. There is evidence that interlocutors respond faster to questions that have an accompanying manual and/or head gesture, than to questions without such visual components (HOLLER; KENDRICK; LEVINSON, 2018[39]). In fact, Holler and Levinson (2019[40]) argue that multimodal information is easier to process than unimodal – that is, only visual or only auditory – information because visual bodily signals may reduce uncertainty at the message level. Humans are good at creating multimodal Gestalts as a result of message unification. As a result, different costs have a synergetic effect. Communication is therefore not Pareto-efficient.

2. A case study: different cues in expressing subject and object

2.1. Theoretical background and previous research

This section investigates the relationships between different cues which can help to communicate “who did what to whom”. One type of cues is formal markers, including case marking and agreement. Another type is fixed word order, which can help to identify the thematic roles of the constituents (e.g. SAPIR, 1921[41]). The position of the verb can be another cue. It is believed that it is easier to process the sentence and infer the roles when the verb is in the medial position between the subject and the object:

[V]erb position is the particular vehicle which most conveniently enables these basic grammatical relations to be expressed by means of word order: the subject occurs to the immediate left, and the object to the immediate right of the verb. I.e. the verb acts as an anchor (HAWKINS, 1986, pp. 48-49[42])

There is experimental evidence that users tend to avoid SOV in favour of SVO when describing reversible transitive events in pantomime, that is, those events where both participants can be subject or object, such as “The mother hugs the boy” and “The boy hugs the mother” (HALL; MAYBERRY; FERREIRA, 2013[43]). This can be interpreted as evidence that verb-medial order indeed helps to identify the roles.

There is another reason why the position of the verb in the middle is beneficial for language processing. The sum distances from the head verb to the subject and object are the smallest when the verb is between subject and object (FERRER-I-CANCHO, 2017[44]), which reduces the processing load.

Finally, we should not underestimate the role of semantics and encyclopaedic knowledge. In most situations, it is a dog that bites a man or a police officer who captures a thief, and not the other way round. This information can be important for the use of the cues. For example, there is a correlation between the predictability of events and the use of overt object marking in Japanese (KURUMADA; JAEGER, 2015[33]). Abstract referential features, such as animacy and identifiability, play an important role in differential marking, as in Spanish or Hebrew, and in probabilistic case marker use, as in Korean (LEE, 2009). There is a negative correlation between predictability and marking, which can be explained by efficiency considerations (JÄGER, 2007[45]; LEVSHINA, 2018[14]).

If the idea of efficient trade-offs is correct, we can expect negative correlations between all these cues (cf. SINNEMÄKI, 2008[22]). Previous quantitative studies have shown a negative correlation between argument marking and rigid word order (SINNEMÄKI, 2014[23]); as well as an association between zero argument marking and verb-medial order (SINNEMÄKI, 2010[46]). The correlation between the final position of the verb and case marking is well known as Greenberg’s (1966b[47]) Universal 41: “If in a language the verb follows both the nominal subject and nominal object as the dominant order, the language almost always has a case system”. However, the three parameters have never been investigated simultaneously. Also, for the first time, these parameters will be estimated from corpora, rather than from grammars, as in the previous studies. As will become clear, the parameters are gradient and should be treated as continuous variables. I will first present a series of pairwise correlations between these parameters. It will be shown that taking the third variable into account can change the picture significantly, which means that the idea of studying trade-offs between two variables only is very questionable. The correlational analyses will allow us to formulate a hypothesis about the relationships between all three cues, which will be tested in a causal analysis.

2.2. Data

The language sample used for the present study includes thirty languages, which are listed in Table 1. The choice of languages was determined by the availability of sufficient data. Two sources were used: the Universal Dependencies (UD) corpora, version 2.6 (ZEMAN et al., 2020[48])6 and online news corpora of 1 million sentences from the Leipzig Corpora Collection (GOLDHAHN; ECKART; QUASTHOFF, 2012[49])7. These two different collections were used in order to ensure that our results are not due to register bias, since the UD corpora represent very diverse types of texts. Also, some UD corpora are very small. As will be demonstrated, the correlations between the parameters based on each type of data are very high, which gives us confidence in the results.

In the online news corpora, each language is represented by one million sentences from online news (categories “news” and “newscrawl”). The corpora contain sentences in random order. The sentences were tokenized, lemmatized and morphologically and syntactically annotated with the help of the UD corpus tools in the R package udpipe (WIJFFELS, 2020[50]). The language models, which were trained on the UD corpora, provide, among other things, universal parts-of-speech tags and dependency relations, which can be compared across different languages. This is crucial for the purposes of the present study.

Language iso 639-3 Genus Family UD corpus UD model
Arabic Bulgarian Croatian Czech Danish Dutch English Estonian Finnish French German Greek (modern) Hindi Hungarian Indonesian   Italian Japanese Korean Latvian Lithuanian Persian Portuguese Romanian Russian Slovenian Spanish Swedish Tamil Turkish Vietnamese ara bul hrv ces dan nld eng est fin fra deu ell hin hun ind   ita jpn kor lav lit pes por ron rus slv spa swe tam tur vie Semitic Slavic Slavic Slavic Germanic Germanic Germanic Finnic Finnic Romance Germanic Greek Indic Ugric Malayo- Sumbawan Romance Japanese Korean Baltic Baltic Iranian Romance Romance Slavic Slavic Romance Germanic Southern Dravidian Turkic Viet-Muong Afro-Asiatic Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Uralic Uralic Indo-European Indo-European Indo-European Indo-European Uralic Austronesian   Indo-European Japanese Korean Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Dravidian Altaic Austro-Asiatic ar_padt bg_btb hr_set cs_pdt da_ddt nl_alpino en_ewt et_edt fi_tdt fr_gsd de_gsd el_gdt hi_hdtb hu_szeged id_gsd   it_isdt ja_gsd ko_kaist lv_lvtb lt_alksnis fa_seraji pt_bosque ro_rrt ru_syntagrus sl_ssj es_ancora sv_talbanken ta_ttb tr_imst vi_vtb arabic-padt-ud-2.4 bulgarian-btb-ud-2.4 croatian-set-ud-2.4 czech-pdt-ud-2.4 danish-ddt-ud-2.4 dutch-alpino-ud-2.4 english-ewt-ud-2.4 estonian-edt-ud-2.4 finnish-tdt-ud-2.4 french-gsd-ud-2.4 german-gsd-ud-2.4 greek-gdt-ud-2.4 hindi-hdtb-ud-2.4 hungarian-szeged-ud-2.4 indonesian-gsd-ud-2.4   italian-isdt-ud-2.4 japanese-gsd-ud-2.4 korean-gsd-ud-2.4 latvian-lvtb-ud-2.4 lithuanian-hse-ud-2.4 persian-seraji-ud-2.4 portuguese-bosque-ud-2.4 romanian-rrt-ud-2.4 russian-syntagrus-ud-2.4 slovenian-ssj-ud-2.4 spanish-gsd-ud-2.4 swedish-talbanken-ud-2.4 tamil-ttb-ud-2.4 turkish-imst-ud-2.4 vietnamese-vtb-ud-2.4
Table 1. Languages, UD corpora and language models used in the case study

2.3. Variables

2.3.1. Formal distinctness of Subject and Object (case marking)

Case marking was operationalized as distinctness of the forms representing transitive subject and object, following the token-based approach in Levshina (2019[25]). The new method can give us more precise information about how frequently case markers can help language users to distinguish between the main participants. This matters for languages with differential and optional case marking. For example, in Russian some nouns have different forms in the Nominative and Accusative (e.g. devočk-a “girl-Nom” and devočk-u “girl-Acc”), while some nouns have identical forms (e.g. stol “table” or myš “mouse”). The question is, how frequently the forms are identical, and how frequently they are distinct. Similarly, some languages like Japanese and Korean have variable marking of subject and object with complex probabilistic rules. All this variability should be taken into account.

There is no reliable morphological annotation at the moment, which could be used to compare the forms in many different languages. The information about formal distinctness was approximated using the existing corpora in the following way. First, I extracted all nouns (wordforms in lower case and lemmas) with the universal syntactic dependency tags “nsubj” (nominal subject) and “obj” (object). In order to take into account languages like Spanish, where the object case marker a is a preposition, I also checked if the head noun had a syntactic dependency “case”, and merged the case marker with the noun, e.g. a_mujer “woman.ACC”. Only non-plural forms were considered in order to exclude the formal variation based on number. I do not expect this restriction to influence the results strongly because plural forms are less frequent than singular ones. For languages with articles written as one word with the nouns (Arabic, Bulgarian, Danish, Romanian and Swedish), subject and object forms were compared separately for definite and indefinite forms because it was too difficult to split them automatically. Indonesian possessive suffixes were not counted as part of wordforms.

Next, for every lemma used as both transitive subject and object in the corpus, the subject and object forms were listed. One form was selected randomly to represent a subject form, and one form to represent an object form, and these forms were compared. The total number of lemmas with distinct forms was computed for each language. This number was weighted by the lemma frequency, so that frequent lemmas had more weight than rare ones. Finally, the distinctiveness scores were divided by the total token frequency of all lemmas that were analyzed.

Following previous research (e.g. SINNEMÄKI, 2008[22]) and the tradition in typology, the analyses presented below were performed on subjects and objects expressed by common nouns (Universal Part of Speech tag “NOUN”). However, I also computed scores for all possible subjects and objects (including pronouns, different nominalizations, symbols, proper nouns, etc.) and compared them with the ones based on nouns only. The correlations between the scores based only on nouns and those based on all possible lexemes are very strong and positive: r = 0.92, p < 0.001 in the UD corpora; r = 0.98, p < 0.001 in the online news corpora.

The formal distinctness scores based on the UD corpora and the online news corpora are displayed in Figure 2. The languages at the bottom have no or very limited case marking, whereas the languages at the top have systematic case morphology. Languages in the middle have diverse types of differential case marking, where the presence of absence of markers is determined by the semantic or pragmatic properties of the referent, lexical class, tense, aspect and other factors. Examples are Russian, where only animate masculine and feminine objects are different from the subject forms; Turkish, where definite and specific indefinite objects are marked; and Hindi, which has a complex case system, in which the ergative marker is added to subjects in perfective clauses, whereas human specific objects are usually marked with the accusative case.

There is a very strong correlation between the two types of data: r = 0.952, p < 0.001. It is not clear what explains the large discrepancies for Tamil, Lithuanian and Korean. Possible reasons can be the small size of the available UD corpora and the noise in the automatically parsed online news corpora.

Indexing of subject and object (agreement) is not investigated in this paper. Previous research has shown that subject agreement is not significantly correlated with word order or case marking, whereas object agreement correlates negatively with the presence of both factors simultaneously (SINNEMÄKI, 2008[22]). Unfortunately, my sample of languages does not allow me to test object agreement statistically. I leave that to future research.

Figure 2. Proportions of distinct subject and object forms in the UD corpora and online news

2.3.2. Word order rigidity

If the order of subject and object is fixed, it can be a reliable cue of the syntactic roles. In order to measure word order rigidity, I used anti-entropy, which is 1 minus Shannon entropy of the order of subject and object. Shannon’s entropy has been used to represent flexibility in word order (LEVSHINA, 2019[25]). The formula for computing entropy of orders SO and OS is as follows:

(2) H = -1 (Pr (SO) * Log Pr (SO) + Pr (SO) * Log Pr (SO)

where the probabilities of SO and OS were computed as simple proportions of each word order taken from the corpora.

The entropy score is minimal when either subject is always before object or the other way round, i.e. Pr (SO) = 1 and Pr (OS) = 0, or Pr (SO) = 0 and Pr (OS) = 1. Entropy is maximal when both have equal probabilities Pr (SO) = Pr (OS) = 0.5. The anti-entropy scores based on the UD corpora and the online news corpora are displayed in Figure 3. As in the previous section, these scores are based only on common nouns. The correlation between rigidity scores in the UD corpora and in the news is positive and high: r = 0.895, p < 0.001. The scores based on only nouns and those based on all possible slot fillers also correlate strongly and positively: r = 0.74, p < 0.001 in the UD corpora, r = 0.85, p < 0.001 in the online news corpora.

Figure 3. Word order rigidity (anti-entropy) scores of subject and object

2.3.3. Position of the verb

The third variable was ‘verb-medialness’, which shows how frequently head verb occurs between subject and object. The procedure was as follows. I computed the number of all clauses (main and finite subordinate clauses) with overt subject and object (“nsubj” and “obj” relationships). Next, I computed the proportion of all clauses where the lexical verb is in the middle. The scores based on the UD corpora and the online news corpora are displayed in Figure 3. The correlation between the scores in the UD corpora and in the online news is nearly perfect: r = 0.992, p < 0.001. One can see a gap between strictly SOV languages (Japanese, Tamil, Korean, Hindi and Turkish) with the lowest scores and all the rest, which are SVO. French, English and Indonesian have the highest scores. The languages in the middle have variable SVO/SOV order (Dutch, German and Hungarian), with the exception of Arabic (SVO/VSO). The scores for the common nouns presented in Figure 4 correlate nearly perfectly with the scores based on all lexemes: r = 0.96, p < 0.001 for the UD corpora, and r = 0.98, p < 0.001 for the news corpora.

Figure 4. Proportion of clauses with head verb between subject and object

2.4. Correlations

This section tests the relationships between the three types of cues. Recall that a trade-off requires a negative correlation between two parameters. Let us test if this requirement is met. Figure 5 displays Spearman’s rank-based correlations between the pairs of variables. The results for both data sources are very similar.

Figure 5. Correlations between word order rigidity, formal distinctness of subject and object and verb-medialness in the UD corpora (left) and in the online news (right)

The correlation between rigid word order and formal distinctness is negative: more rigid word order means less distinct subject and object forms (p < 0.001). It is also instructive to look at a scatter plot with language names in Figure 6, which shows this relationship in more detail. It tells us that languages with similar forms (the left-hand side of the corresponding small plot) indeed have rigid word order, but that languages with less similar forms are somewhat more variable with regard to word order rigidity. For example, Finnish, Japanese, Korean and Persian have highly distinct forms, but quite rigid word order, while Hungarian and Tamil also have distinct forms, but variable word order. This means that the trade-off is not perfectly symmetric, and the relationship is to some extent implicational, rather than fully correlational: Lack of formal distinctions strongly implies rigid word order, but rigid word order less strongly implies low formal distinctness, as shown by Finnish, Korean, Japanese and Persian.

Figure 6. Scatterplot of distinct forms and rigid word order of subject and object in the UD corpora

The next correlation is between distinct forms and verb medialness. The correlation is again negative, as predicted (p < 0.001). Therefore, high formal distinctness should mean that the verb is less frequently in the middle, and low formal distinctness should mean that the verb is more frequently in the middle. However, the scatter plot shown in Figure 7 suggests again that this is a simplification. When the forms are not distinct, the verb is typically between subject and object, as the large cluster of languages in the bottom right corner shows. Yet, when the forms are distinct, the verb can be anywhere. For example, it is rarely medial in Turkish, Hindi, Japanese, Korean and Tamil (see top left corner), but usually medial in the Baltic and Finnic languages (see top right corner). This relationship is even more obviously implicational than in the previous plot.

Figure 7. Scatterplot of verb medialness and distinct forms of subject and object in the UD corpora

Finally, we observe a positive correlation between rigid word order and verb-medialness. This finding is similar to the results reported by Sinnemäki (2010[46]), who used categorical data from a large sample of typologically diverse languages. The positive correlation is a case of cue redundancy. The distribution of the scores is shown in Figure 8. We can see that very rigid word order in French, Indonesian or English is strongly associated with verb-medial position, but the verb-final languages on the left behave in very diverse ways.

Figure 8. A scatterplot of verb medialness and rigid word order

So far, we have discussed pairwise correlations that did not take into account the presence of the third variable. However, this analysis is incomplete because when testing the correlation between two types of cues, we need to control for the third one. In order to do so, one can use partial correlation coefficients. They are shown in Table 2.

Rigid Word Order Distinct Forms Medial Verb
Rigid Word Order UD: -0.62 (p < 0.001) news: -0.57 (p = 0.001) UD: 0.04 (p = 0.805) news: 0.10 (p = 0.588)
Distinct Forms UD: -0.62 (p < 0.001) news: -0.57 (p = 0.001) UD: -0.44 (p = 0.016) news: -0.49 (p = 0.007)
Medial Verb UD: 0.04 (p = 0.805) news: 0.10 (p = 0.588 UD: -0.44 (p = 0.016) news: -0.49 (p = 0.007)
Table 2. Partial correlations between the cues in the UD corpora and in the online news

The coefficients for the UD corpora and the online news corpora are similar, which means that our results are robust. The numbers demonstrate that the correlation between formal distinctness and rigid word order is the strongest one, followed by the negative correlation between formal distinctness and verb-medialness. This is similar to the previous results. The correlations are now weaker, however. The most striking difference is that the correlation between rigid word order and verb-medialness disappears when we take into account formal distinctness.

One may object that the data are dependent because many of the languages come from the same families and genera (that is, Baltic, Germanic, Romance, Slavic and Finnic). If we take into account these dependencies, traditional correlational analysis is not appropriate any more. Additional tests (LEVSHINA, In preparation[51]) based on permutation and resampling support the quantitative results presented here.

2.5. From correlation to causation

The quantitative analyses have revealed a negative correlation between rigid word order and distinct forms of subject and object. We also found a negative correlation between distinct forms and medial position of the verb. Rigid word order and verb-medialness are correlated positively, but this correlation disappears when the formal distinctness is taken into account. This supports the idea of Fenk-Oczlon and Fenk (2008[21]) that trade-offs are more likely to be observed between different linguistic domains (e.g. syntax and morphology, or semantics and phonology) than within the same domain (see also SINNEMÄKI, 2008[22]).

We also saw in the scatter plots that languages lacking formal distinctness have rigid word order, and tend to have verb in the middle. So, one might think that lack of formal distinctness causes language users to provide cues with the help of word order. If one changes the perspective, it is also possible to say that the languages with rigid word order have low formal distinctness, whereas SOV languages tend to have high distinctness, so one could claim that it is word order that can explain case marking. So, what is the direction of causality – from word order to case marking, or the other way round?

There are some arguments in the literature that word order can determine case marking. According to Kiparsky (1996[52]), the shift to VO began in Old English before the collapse of the case system (and also before the loss of subject-verb agreement). Similarly, Bauer (2009[53]) shows that the change to VO and rigid word order in Late and Vulgar Latin was before the loss of inflection in Romance. There is a hypothesis that Indo-European languages drift from SOV to SVO and rigid word order, which leads to the loss of inflections (KOCH, 1974[54]). Since most of the languages in our sample are Indo-European, this may be an explanation of the correlations we observe.

There is also experimental evidence of a causal link from word order to case marking. In a study by Fedzechkina, Newport and Jaeger (2016[55]), learners were presented with miniature artificial languages containing optional case marking and either fixed or flexible constituent order. It was found that the learners of the fixed order language used case marking significantly less often than the other learners, and less often than in the input language, which means that rigid word order indeed triggers the loss of distinct forms. At the same time, the word order properties of the input languages remained stable.

In order to test this hypothesis, we should move from binary correlations to multivariate causal analysis (BLASI; ROBERTS, 2017[56]). A causal analysis using the PC algorithm (SPIRTES; GLYMOUR; SCHEINES, 2000[57]; KALISH et al., 2012[58]) produces the directed acyclic graph shown in Figure 9. The arrows represent the direction of effect of one variable on another, with the significance level of 0.05. The results for the UD corpora and the online news data are identical. Similar results are obtained with the help of a resampling method, where one draws one language per genus 1,000 times, logging the probability of every link, and computes the average probability (LEVSHINA, In preparation[51]).

Figure 9. Causal analysis of three types of cues

The graph tells us that both word order variables contribute jointly to the distinctness of subject and object forms. The word order variables are not causally related on their own. This is in line with the results of the partial correlational analysis. Both word order variables have an effect on formal distinctness. This supports the theoretical claims from the literature discussed above. A new finding is that the verb position also affects formal distinctness. In particular, we can hypothesize that verb-finalness increases the distinctness of subject and object.

3. Discussion

This paper has discussed a popular idea in functional linguistics, namely, that different costs or benefits are in relationships of efficient trade-offs, which can be thought of as Pareto frontiers. I argued that there are many conceptual and methodological problems with that idea. First, it is difficult to identify the exact nature of costs and benefits. Second, a negative correlation between costs or benefits does not always mean that the language user can make a rational choice. Third, binary trade-offs ignore other relevant costs and benefits. Therefore, it would be safer to drop the term “trade-off” altogether.

In game theory and economics, the situation of Pareto efficiency is also known as a zero-sum game, where the interacting parties’ aggregate gains and losses add up to zero. It has been argued, however, that there is an increasing chance of finding non-zero-sum solutions as the complexity of a system increases (WRIGHT, 2000[59]). Language as a highly complex system is not a zero-sum game.

As an illustration, I presented a case study of three types of cues that help to differentiate between subject and object: rigid word order, medial position of the verb and formal distinctness of the arguments provided by case morphemes and adpositions. The results of correlational analyses demonstrate that not all cues are efficiently related. There can be redundancy in the amount of information available to the addressee. Also, we have seen that some relationships are more implicational than correlational, which also leads to cue redundancy. The only thing disfavoured by the languages is the absence of any cues. It seems that a breakdown of communication (with additional costs of reanalysis and conversational repair) is more dangerous than wasting the resources. This conclusion is in line with typological evidence, which suggests that all languages have some amount of redundancy (HENGEVELD; LEUFKENS, 2018[19]).

Taking the speaker’s perspective, we can say that the speaker saves effort by providing less overt coding when the word order provides sufficient information. This is efficient behaviour, but it is difficult to treat it as a real trade-off because, unlike the articulatory efforts required for production of case marking, it is not clear what kind of costs word order has for the speaker (see also the discussion in Section 1.1). Also, the existence of languages with case marking but fairly rigid and verb-medial word order suggest that the speaker’s behaviour is not always efficient.

At the moment, we do not know what the costs of acquiring more or less flexible word order are for learners. I leave the question of trade-offs in language acquisition open.

Finally, I argued that bivariate correlations should be replaced with multivariate causal analysis and showed how this can be done for the three types of cues. This study has demonstrated that word order determines case marking, but not the other way round. It seems that fixed word order allows case marking to disappear. Also, it may be that verb-final languages tend to develop and maintain case forms. These causal hypotheses are preliminary and need to be further investigated on a larger sample without the Indo-European bias. Other linguistic and extralinguistic factors, such as agreement, semantics, population size and the presence of intensive language contact, should also be taken into account.

It is easy to understand why the idea of a trade-off is appealing: it is very simple and intuitive. If you take a larger slice of a cake, the others will get less. In fact, people have a bias towards zero-sum thinking, which persists on a personal level and as a cultural worldview ideology (RÓŻYCKA-TRAN; BOSKI; WOICISZKE, 2015[60]). The zero-sum thinking makes people choose win-lose strategies instead of trying to find win-win solutions – a tendency that has become probably too obvious in the world politics nowadays. Our task as scientists is to prevent people from falling into this cognitive trap, and, of course, not to commit this mistake ourselves.

Acknowledgements

The research in this paper was funded by the Netherlands Organisation for Scientific Research (NWO) under Gravitation grant Language in Interaction, grant number 024.001.006. I also sincerely thank Mira Ariel, Sterre Leufkens and Kaius Sinnemäki for their insightful comments and constructive feedback, which have helped me to improve the paper substantially. All remaining errors are solely mine.

Referências

ARIEL, Mira. Accessing Noun-Phrase Antecedents. London: Routledge, 1990.

ARIEL, Mira. “Or Constructions: Monosemy versus polysemy”. In: MacWhinney, Brian; MALCHUKOV, Andrej; MORAVCSIK, Edith A., Competing Motivations. Oxford: Oxford University Press. 2014, p. 333-347. DOI https://doi.org/10.1093/acprof:oso/9780198709848.001.0001

BAAYEN, R. Harald; MILIN, Petar; RAMSCAR, Michael. Frequency in lexical processing. Aphasiology, 30(11), p. 1174–1220, 2016. DOI https://doi.org/10.1080/02687038.2016.1147767

BAUER, Brigitte M. “Word order”. In: BALDI, Philip; CUZZOLIN, Pierluigi. New Perspectives on Historical Latin Syntax: Vol 1: Syntax of the Sentence. Berlin: Mouton de Gruyter, 2009, p. 241-316.

BLASI, Damián E.; ROBERTS, Seán G. “Beyond binary dependencies in language structure”. In: ENFIELD, Nick J., Dependencies in Language. Berlin: Language Science Press, 2017, p. 117–128. DOI https://doi.org/10.5281/zenodo.573774

CLARK, Herbert H. Using Language. Cambridge: Cambridge University Press, 1996.

CLARK, Herbert H.; WILKES-GIBBS, Diana. Referring as a collaborative process. Cognition, 22(1), p. 1-39, 1986. DOI https://doi.org/10.1016/0010-0277(86)90010-7

CROFT, William A. On being a student of Joe Greenberg. Linguistic Typology, 6(1), p. 3–8, 2002. DOI https://doi.org/10.1515/lity.2002.001

DELBRÜCK, Berthold. Einleitung in das Studium der indogermanischen Sprachen. Leipzig: Breitkopf & Härtel. 5th ed, 1908. https://archive.org/details/einleitungindas00delbgoog

DU BOIS, John. “Competing motivations”. In: HAIMAN, John. Iconicity in Syntax. Amsterdam: John Benjamins, 1985, p. 343-365.

FEDZECHKINA, Maryia; NEWPORT, Elissa L.; JAEGER, T. Florian. Balancing Effort and Information Transmission During Language Acquisition: Evidence From Word Order and Case Marking. Cognitive Science, 41(2), p. 416-446, 2016. DOI https://doi.org/10.1111/cogs.12346

FENK-OCZLON, Gertraud; FENK, August. “Complexity trade-offs between the subsystems of language”. In: MIESTAMO, Matti; SINNEMÄKI, Kaius; Karlsson, Fred, Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 2008, p. 43–65.

FERRER-I-CANCHO, Ramon. Why do syntactic links not cross? Europhysics Letters, 76(6), p. 1228-1234, 2006. DOI https://doi.org/10.1209/epl/i2006-10406-0

FERRER-I-CANCHO, Ramon. The placement of the head that maximizes predictability. An information theoretic approach. Glottometrics, 39, p. 38-71, 2017. DOI https://doi.org/10.1111/cogs.12346

GELL-MANN, Murray. What is complexity? Complexity, 1(1), p. 16-19, 1995. DOI https://doi.org/10.1002/cplx.6130010105

GIBSON, Edward. “The dependency locality theory: A distance-based theory of linguistic complexity”. In: MARANTZ, Alec P.; MIYASHITA, Yasushi; O’NEIL, Wayne, Image, Language, Brain: Papers from the First Mind Articulation Project Symposium. Cambridge, MA: MIT Press, 2000, p 95–126.

GIBSON, Edward; FUTRELL, Richard; PIANTADOSI, Steven; DAUTRICHE, Isabelle; MAHOWALD, Kyle; BERGEN, Leon; Levy, ROGER. How Efficiency Shapes Human Language. Trends in Cognitive Science, 23(5), p. 389-407, 2019. DOI https://doi.org/10.1016/j.tics.2019.02.003

GOLDHAHN, Dirk; ECKART, Thomas; QUASTHOFF, Uwe. “Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages”. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid; DECLERCK, Thierry; et al., Proceedings of the Eighth International Conference on Language Resources and Evaluation. Istanbul: ELRA, 2012, p. 759-765. http://www.lrec-conf.org/proceedings/lrec2012/pdf/327_Paper.pdf

GREENBERG, Joseph H. Language Universals, With Special Reference to Feature

Hierarchies. The Hague: Mouton, 1966a.

GREENBERG, Joseph H. “Some universals of grammar with particular reference to the

order of meaningful elements”. In: GREENBERG, Joseph H., Universals of grammar,

Cambridge, MA: MIT Press, 1966b, p. 73-113.

HAIMAN, John. Iconic and economic motivation. Language, 59(4), p. 781-819, 1983.

HALE, John. Uncertainty about the rest of sentence. Cognitive Science, 30(4), p. 643-672, 2006. DOI https://doi.org/10.1207/s15516709cog0000_64

HALL, Matthew L.; MAYBERRY, Rachel I.; FERREIRA, Victor S. Cognitive constraints on constituent order: evidence from elicited pantomime. Cognition, 129(1), p. 1-17, 2013. DOI https://doi.org/10.1016/j.cognition.2013.05.004

HARMON, Zara; KAPATSINSKI, Vsevolod. Putting old tools to novel uses: The role of form accessibility in semantic extension. Cognitive Psychology, 98, p. 22-44, 2017, DOI https://doi.org/10.1016/j.cogpsych.2017.08.002

HASPELMATH, Martin. “Creating economical morphosyntactic patterns in

language change”. In: GOOD, Jeff, Language Universals and Language Change. Oxford: Oxford University Press, 2008, p. 185-214.

HASPELMATH, Martin. “On system pressure competing with economic motivation”. In: MacWhinney, Brian; MALCHUKOV, Andrej; MORAVCSIK, Edith A., Competing Motivations. Oxford: Oxford University Press. 2014, p. 197-208.

HASPELMATH, Martin; KARJUS, Andres. Explaining asymmetries in number marking: Singulatives, pluratives and usage frequency. Linguistics, 55(6), p. 1213-1235, 2017. DOI https://doi.org/10.1515/ling-2017-0026

HAWKINS, John. A Comparative Typology of English and German. Unifying the contrasts. London: Croom Helm, 1986.

HAWKINS, John. Efficiency and Complexity in Grammars. Oxford: Oxford University Press, 2004.

HENGEVELD, Kees; LEUFKENS, Sterre. Transparent and non-transparent languages. Folia Linguistica, 52(1), p. 139–175, 2018. DOI https://doi.org/10.1515/flin-2018-0003

HOLLER, Judith; KENDRICK, Kobin H.; LEVINSON, Stephen C. Processing language in face-to-face conversation: Questions with gestures get faster responses. Psychonomic Bulletin & Review, 25(5), p. 1900-1908, 2018. DOI https://doi.org/10.3758/s13423-017-1363-z.

HOLLER, Judith; LEVINSON, Stephen C. Multimodal language processing in human communication. Trends in Cognitive Sciences, 23(8), p. 639-652, 2019. DOI https://doi.org/10.1016/j.tics.2019.05.006

JÄGER, Gerhard. Evolutionary Game Theory and Typology. A Case Study. Language, 83(1), p. 74-109, 2007.

JAEGER, T. Florian. Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology 61(1), 23-62, 2010. DOI https://doi.org/10.1016/j.cogpsych.2010.02.002

JAEGER, T. Florian; BUZ, Esteban. “Signal reduction and linguistic encoding”. In: Fernández, Eva M.; SMITH CAIRNS, Helen, The Handbook of Psycholinguistics. Hoboken, NJ: John Wiley & Sons, 2017, p. 38-81. DOI https://doi.org/10.1002/9781118829516.ch3

JAEGER, T. Florian; TILY, Harry J. On language “utility”: Processing complexity and communicative efficiency. Wiley Interdisciplinary Reviews: Cognitive Science, 2(3), p. 323–335, 2011. DOI https://doi.org/10.1002/wcs.126

KALISH, Markus; MÄCHLER, Martin; COLOMBO, Diego; MAATHUIS, Marloes H.; BÜHLMANN, Peter. Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software, 47(11), p. 1-26, 2012. DOI https://doi.org/10.18637/jss.v047.i11

KELLER, Rudi. On Language Change: The Invisible Hand in Language. London: Routledge, 1994.

KEMP, Charles; XU, Yang; REGIER, Terry. Semantic Typology and Efficient Communication. Annual Review of Linguistics 4, p. 109-128, 2018. DOI https://doi.org/10.1146/annurev-linguistics-011817-045406

KIPARSKY, Paul. “The Shift to Head-initial VP in Germanic”. In: THRÁINSSON, Höskuldur; EPSTEIN, Samuel D.; PETER, Steve, Studies in Comparative Germanic Syntax II. Dordrecht: Kluwer, 1996, p. 140-179.

KOCH, Monika. A Demystification of Syntactic Drift. Montreal Working Papers in Linguistics, 3, p. 63-114, 1974.

KOPLENIG, Alexander; MEYER, Peter; WOLFER, Sascha; MÜLLER-SPITZER, Carolin. 2017. The statistical trade-off between word order and word structure – Large-scale evidence for the principle of least effort. PLoS ONE, 12(3), e0173614, 2017. DOI https://doi.org/10.1371/journal.pone.0173614

KURUMADA, Chigusa; JAEGER, T. Florian. Communicative efficiency in language production: Optional case-marking in Japanese. Journal of Memory and Language 83, p. 152-178, 2015. DOI https://doi.org/10.1016/j.jml.2015.03.003

LEVSHINA, Natalia. Towards a Theory of Communicative Efficiency in Human Languages. Habilitation thesis. Leipzig University, 2018. DOI http://doi.org/10.5281/zenodo.1542857

LEVSHINA, Natalia. Token-based typology and word order entropy. Linguistic Typology, 23(3), p. 533–572, 2019. DOI https://doi.org/10.1515/lingty-2019-0025

LEVSHINA, Natalia. In preparation. Bounded rationality and limited efficiency: A correlational and causal analysis of subject and object cues in thirty languages.

MARTINET, André. Grundzüge der Allgemeinen Sprachwissenschaft. Stuttgart: Kohlhammer, 1963.

MCWHORTER, John H. The world’s simplest grammars are creole grammars. Linguistic Typology, 5(2-3), p. 125-166, 2001. DOI https://doi.org/10.1515/lity.2001.001

PIANTADOSI, Steven T.; TILY, Harry; GIBSON, Edward. Word lengths are optimized for efficient communication. PNAS, 108(9), p. 3526–3529, 2011. DOI https://doi.org/10.1073/pnas.1012551108

PIANTADOSI, Steven T.; TILY, Harry; GIBSON, Edward. The communicative function of ambiguity in language. Cognition, 122, p. 280-291, 2012. DOI https://doi.org/10.1016/j.cognition.2011.10.004

RÓŻCKA-TRAN, Joanna; BOSKI, Paweł; WOJCISZKE, Bogdan. Belief in a zero-sum game as a social axiom: A 37-Nation Study. Journal of Cross-Cultural Psychology 46(4), p. 525–48, 2015. DOI https://doi.org/10.1177/0022022115572226

SAPIR, Edward. Language: An Introduction to the Study of Speech. New York: Harcourt, 1921.

SHOSTED, Ryan K. Correlating complexity: A typological approach. Linguistic Typology, 10(1), p. 1-40, 2006. DOI https://doi.org/10.1515/LINGTY.2006.001

SINNEMÄKI, Kaius. “Complexity trade-offs in core argument marking”. In: MIESTAMO, Matti; SINNEMÄKI, Kaius; Karlsson, Fred, Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 2008, p. 67–88.

SINNEMÄKI, Kaius. Word order in zero-marking languages. Studies in Language 34(4), p. 869-912, 2010. DOI https://doi.org/10.1075/sl.34.4.04sin

SINNEMÄKI, Kaius. Language universals and linguistic complexity. Three case studies in core argument marking. PhD dissertation, University of Helsinki, 2011.

SINNEMÄKI, Kaius. “Complexity trade-offs: A case study”. In: NEWMEYER, Frederick J.; PRESTON, Laurel B., Measuring Grammatical Complexity. Oxford: Oxford University Press, 2014, p. 179–201.

SPIRTES, Peter; GLYMOUR, Clark; SCHEINES, Richard. Causation, Prediction, and Search. 2nd edn. Cambridge, MA: MIT Press, 2000.

WIJFFELS, Jan. udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the UDPipe NLP Toolkit. R package version 0.8.4-1. 2020. https://CRAN.R-project.org/package=udpipe

WRIGHT, Robert. Nonzero: The Logic of Human Destiny. New York: Pantheon, 2000.

ZEMAN, Daniel; NIVRE, Joakim; ABRAMS, Mitchell; et al., 2020, Universal Dependencies 2.6, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-3226.

ZIPF, George. The Psychobiology of Language: An Introduction to Dynamic Philology. Cambridge, Mass.: M.I.T. Press, 1965[1935].

ZIPF, George. Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison–Wesley, 1949.