Este artigo tem por objetivo descrever a ocorrência do definido fraco (ex. Ana foi ao hospital), introduzido por Carlson e Sussman (2005), em corpus do português brasileiro (PB). Foram analisadas 400 ocorrências de 31 palavras que podem apresentar a leitura fraca em PB (ex. o hospital). Observamos se a palavra é determinada por um artigo definido, em seguida se leitura do DP é fraca (Carlson e Sussman, 2005), forte - ou regular - (Russell, 1905) ou genérica (Carlson, 2005). Além da leitura, analisamos a função sintática do DP (sujeito, objeto, adjunto). Como resultado, trazemos a distribuição dos definidos fracos em PB, além de realizarmos uma análise mais detalhada sobre os que ocupam a posição de sujeito.


In this paper, we aim to describe the occurrences of weak (Carlson and Sussman, 2005), regular (Russell, 1905), and generic (Carlson, 2005) definites in a Brazilian Portuguese (henceforth BP) corpus. The debate around the proper semantics of weak definites (or even their existence as an independent use of the definite article) has been extremely productive in the last few years (Carlson e Sussman, 2005; Aguilar-Guevara and Zwarts, 2011; Schwarz, 2012; inter alia). Part of this debate stems from the difficulty in determining in which semantic and syntactic environments those definites appear. Some descriptions are based on distributional hypothesis. For instance, Schwarz’s (2012: 4) hypothesis of semantic incorporation is grounded on the idea that weak definite interpretations are available only for verb objects (or adjuncts), whereas Carlson (2013) believesa weak definite can occur in the subject position - albeit very rarely. Therefore, a clear er idea of the weak definite distribution may bring important additions to the discussion concerning definiteness.

The best way to map this distribution is observing the phenomenon in corpus, since it provides an empirical and ecological analysis in which the lexical items are taken from the linguistic environment where they had occurred (c.f. Sardinha, 2004, Kennedy, 1998). Moreover, the position the weak definite occupies in sentences (subject, object, or adjunct) can inform restrictions for the realization of a weak reading as well as shed light into what syntactic environments may favor this reading. For that reason, we have searched the ptTenTen corpus, available on Skecth Engine,1 in an attempt to answer the following questions: in which structures do noun phrases determined by an article appear? Is there a correlation between the weak definite reading and the role it has in the sentence?

We initially chose 50 words that can receive a weak interpretation2 in BP. After a cursory investigation, we selected 31 words and examined the first 400 occurrences of those words. All occurrences of Noun Phrases whose determiners were single definite articles were categorized into three categories - regular, weak, and generic -, and afterwards we tagged those NPs as subjects, objects or adjuncts. Our observations confirm that itis indeed possible to have a weak definite as a subject, although in a very restricted number and in specific constructions. For the purposes of the present paper, we have decided to focus on these items, because of their potential contribution to the present discussion.

In section 1, we present distinctions amongst weak, regular and generic definites, and review relevant recent literature. In Section 2, we establish the goals of the corpus analysis, and describe the methodology in Section 3. The results are shown in Section 4, and we discuss the relevance of our data to the ongoing debate on the characterization of the weak definite in Section 5.

1.     Weak, regular, and generic definites

(1) - Where is Anna?

- She went to the bathroom.

- And what about John?

- He went to the bathroom too.


When one reads dialogue (1), the nominal expression the bathroom may be interpreted as co-referential. Nonetheless, it is equally possible to interpret the two instances as referentially independent. In BP, the latter is the most frequent reading (probably so in English, as well)3. In contrast, the nominal expression the auditorium cannot be referentially disjoint in the following Example (2):


(2) - Where is Anna?

- She went to the auditorium.

- And what about John?

- He went to the auditorium too.4


The highlighted expressions in (2) necessarily refer to the same entity. This usage is the typical regular DP as classically presented by Russell (1905), showing what the author called the uniqueness property. Uniqueness is defined as the attribute which the quantification of the noun phrase “the X” (X as any noun expression) entails the existence of one and only one entity in the world that could be the reference of X. The auditorium in (2) is uniquely identifiable and therefore different instances of the same expression in that context necessarily point to the same referent.

Carlson and Sussman (2005) proposed that cases that lack the uniqueness property, such as (1), constitute a different type of definite phrase, which they named as weak definites55.

A main characteristic of weak definites (Carlson and Sussman, 2005; Carlson et al., 2006; Carlson et al., 2013) is that they are lexically restricted. Even within the same construction (e.g. verb to go + the nominal phrase), only the bathroom (1) can be read as weak; on the other hand, the auditorium (2) demands a regular interpretation. Furthermore, the authors (Carlson and Sussman, 2005; Carlson et al., 2006; Carlson et al., 2013) suggest that weak definites do not usually appear in subject position6, appearing within constructions like go to the hospital/bathroom. As the authors claim (2013:14), “Weak definites are further restricted by the need to co-occur with, or be “governed by,” certain other lexical items - verbs and prepositions”. Those characteristics and some other experimental data (Carlson et al., 2013; Klein et al.; 2013) led the authors to propose an “incorporation” analysis, in which the noun itself lacks the uniqueness, while providing a notion of “(cultural) familiarity”7.

What the authors call “incorporation” (their quotation marks) is semantic in nature, not to be confused with what is usually called incorporation in syntactic/morphological theories. One of the central characteristics of incorporation is the so-called semantic enrichment8, or an enriched interpretation (presented by Goldberg, 1995). The fact that weak definites generally present semantic enrichment is an important argument used by Carlson et al. (2013) to support the “incorporation” analysis.

Schwarz (2012) has also claimed that weak definites were part of an incorporation process. According to the author (p.15), “weak definites are definites appearing in verb phrases that denote kinds of events”. He affirms that weak definites are associated with typical activities. Schwarz also highlights that weak definites only appear as objects of certain verbs and/or prepositions. The difference between the analyses in Carlson et al. (2013) and Schwarz (2012) is that the latter defends the predicate has a ‘regular’ individual as its argument “to combine semantically with kind- denoting terms” (p.16).

An alternative analysis9 is proposed by Aguilar-Guevara e Zwarts (2011, 2013). The authors claim the weak definite denotes a kind, not an individual, being a generic DP. They state weak definites and generic definites would be “different faces of same phenomenon” (2011:15). Unlike Schwarz (2012) and Carlson et al. (2013), Aguilar-Guevara and Zwarts declare that the noun denotes kind, not only the incorporated VP10. Therefore, this would be the realization of a third type of definite: the generic one. This generic definite, illustrated by the radio in (3) is productive in BP as well as in English.

(3) Marconi invented the radio11.

The analysis in Carlson et al. (2013) is ultimately difficult to distinguish from Aguilar-Guevara and Zwarts’, but the authors emphasize a different focus on their investigation. They defend that the most important aspect is to delineate clearly what is the compositional role fulfilled by the definite article in those constructions (p. 19):


Also, in a way the incorporation analysis we are assuming makes it very hard to distinguish it from the “kinds” analysis discussed above. This is because the meaning of the incorporated bare nominal form is “generic” in the sense that it does not have the capacity for individual reference. So the issue for us is primarily the compositional role of the definite article, and not primarily about generic interpretations. (authors’ emphasis)


Similarly, in our work, we endeavor to remain relatively agnostic in what is the correct theory; however, we aim to contribute to a better understanding of the definite article compositional role. When categorizing our corpus instances, we conservatively decided to hold on to Carlson’s initial characterization of weak definites, but remaining open to adopt any alternative explanations. Consequently, we stuck to an initial distinction between generic and weak definites: we categorized our tokens into prototypical generic definites (Carlson, 2005), and initially categorized tokens as weak ones where there was the possibility of a weak reading. Examples (4) and (5) below illustrate this approach. Provisionally, we have decided that examples like (4) - where there is a clear reference to a kind or a whole category - are to be called generic definites, whereas instances similar to (5) - where there is an individual that is not uniquely identifiable - were labelled weak.

Figure 1.

As expected, the distinction is not always easy to be made and there are plenty of examples that defy the initial criteria. One of the most common cases is the one where there is a clear generic context, but the proper DP categorization is somewhat more doubtful. For instance, in Example (6): is the airplane a weak or generic definite?

Figure 2.

If the radar shows that the airplane has been diverted for more than a few kilometers, or degrees, from the initial flight plan, the first response is an attempt from the FAA’s controller to contact (the pilot) via radio.

A possible criterion to decide between the alternatives is to use a property of weak definites observed by Aguilar-Guevara (2011), Schwarz (2012) and other authors: a pronoun referring back to a weak definite always present a unique reading, as shown in (7):


(7) Bill is in the hospitali, and John is, too. Iti has an excellent heart surgery department.12


Example (7) shows that a DP that may initially be read as weak has its interpretation decidedly steered towards a regular one when it is referred back by a pronoun (it in the Example). We suggest, in contrast, that the same does not happen with generic definites. Compare examples 8 (b and c) below. A pronoun whose antecedent is a generic still refers to a kind. If the generic definite is re-introduced by a pronoun, the generic reading is maintained. In (8a) the guitar is introduced as a clear kind reference. If a pronoun refers to it in a following sentence (as in 8b), the guitar still has a generic reading.

Figure 3.

(8b) And one reason is that it is cheaper than most of instruments.


Whereas, a sentence that does not allow a generic reading (as in 8c) fails to make an appropriate reference to the antecedent:


(8c) *And it is broken.


For instance, if we retrieve Example (6) above, and create a continuation with an anaphoric pronoun for it (as in 6b) the regular interpretation is enforced:


(6) If the radar shows that the airplane has been diverted for more than a few kilometers.

(6b) It was kidnaped.


This distinction in Example (6) (although, by all means, not conclusive) is an indication of a greater similarity between weak definites and generic ones, and was the criterion operationally adopted here for the categorization.

In addition, by categorizing definite expressions and looking into their distribution, we intended to evaluate their semantic behavior in different structures, which, in turn, may help to understand the incorporation matter, especially in subject occurrences.

2. Methodology

2.1 The ptTenTen corpus

The corpus ptTenTen used for this study has a database of more than

2.7 million words, and was developed by Kilgarriff et al (2014)13. The corpus was built searching the language productions through the Web, due to the Internet richness of informal and speech-like genres.

2.2 Materials

We extracted the first 400 hundred occurrences of the selected 31 words, regardless of determination. Then, we have selected from the subset of words the ones preceded by a definite article, with or without a preposition14. Nouns preceded by indefinite articles, determined by pronouns, bare nominal expressions, and any case with nouns not determined by definite article were excluded.

We also excluded words that were part of a title or a proper name, an idiom or metaphors and, also, items that functioned as adjectives in specific contexts. We did not include any words that deviated from the expected form – probably due to typographical errors – or from the expected meaning. Lastly, we have focused only on singular nominal expressions. The final item count included in the present study is 2196.

2.3 Procedures

Each item was extracted and analyzed in its clause. All the procedures and tests were decided prior to the beginning of the analysis. Every occurrence was analyzed by, at least, two researchers, and the tests described below were used to make the final decision.

We proceeded to categorize each item as weak, regular, and generic. Simple tests were applied to each token to help with the label attribution. Weak definites were operationally defined here as expressions that refer to individuals, but lack a uniquely identifiable referent (9); regular definites had uniquely identifiable referents (10); and the generics were the ones referring to kinds (11).

Figure 4.

The sloppy identity15 test was applied to differentiate weak and regular definites. In this case, only weak definites are felicitous with sloppy identity, as the contrast between (9a) e (10a) shows.


(9) a. Mara Silva, 56 years old, ended up in the hospital. João, 36 years old, did too. (different hospitals acceptable)

(10) a. In Jenin, the Israeli occupation surrounded the hospital and expelled the press. The Syrian occupation did too (must be the same hospital).


The test proposed in the introduction was used in controversial decisions between generic and weak definites. The testuses a pronoun and analyzes its consequence to the referent identity. In cases where the presence of the pronoun generated referent individualization, we decided to categorize the expression as weak, otherwise as a generic. The comparison between (9b) and 11(b) illustrates once again the test.:


(9) b. Mara Silva, 56 years old, ended up in the hospital. It was very dirty.

(11) b. *The second half of the XIXth century, death doesn’t happen at home anymore, under the eyes of the family, but in the hospital. It was very dirty.


In the next stage, we further classified each token according to its syntactic role into subject, object, adjunct, or other. To dispel any questions on specific cases where it is debatable if one token is an object or an adjunct, we have turned to Luft (2010). The “other” classification was chosen when the expression had any other syntactic function, e.g. passive agent.

3. Results

We observed the distribution presented in Graphic 1 that shows that the regular reading is significantly more frequent than the others, 45,6% (X2 = 205.2568, df = 2, p-value < 0.01x10-14), as expected. An interesting finding is that, according to the categorization criteria employed here, weak DPs occur 33,7%, significantly more (X2 = 68.5059, df = 1, p-value < 0.01x10-14) than the generic ones, 27,5%.

Figure 5.

GRAPHIC 1: Frequency of definite types

Afterwards, we examined the relation between syntactic function and the classification of definites. It is no surprise that adjuncts are the most productive category, since one sentence can have only one subject and a maximum of two objects, but an open number of adjuncts. That is the reason why adjuncts present the largest number of occurrences in all categories. Interestingly, however, weak definites (Graphic 2) appear as adjuncts (45,7%) as much as objects (46,6%), (statistical test: X2 = 0.0544, df = 1, p-value = 0.8156). As described in the introduction, Carlson et al. (2006) were correct in saying that the weak definites seldom appear as subjects - only 7,2% of the occurrences (Graphic 2), significantly less than the other categories ( X2 = 212.5607, df = 2, p-value < 0.01x10-14).

Figure 6.

GRAPHIC 2: Weak x Syntactic Function

Generics (Graphic 3) are more uniformly distributed between subject (25,1%) and object (20,3%), being adjuncts a significant half of the data (54,6%).This difference is confirmed by the chi-square test (X2 = 85.6232, df = 2, p-value < 0,01x10-14). The same overall pattern is observed with regular definites (Graphic 4), presenting a significant majority of adjuncts (43,7%) (X2 = 52.7934, df = 2, p-value<0,01x10-10), followed by objects (31,3%), and subjects (25%).

Figure 7.

GRAPHIC 3: Generic x Function

Figure 8.

GRAPHIC 4: Regular x Syntactic function

The most important finding in this distribution pattern is the occurrence of a small amount of subjects that are weak definites. From a universe of 45 of these, 42 (93.3%) have a similar structure, exemplified in (6) above and repeated here:

Figure 9.

If the radar shows that the airplane has been diverted for more than a few kilometers, or degrees, from the initial flight plan, the first response is an attempt from the FAA’s controller to contact (the pilot) via radio.

In those cases, it is disputable if we are dealing with a weak or a generic definite. The individualization test we devised here suggests that we should subscribe to a weak interpretation16. In a remainder of cases, admittedly very few (7%) of weak definites are in subject position, and it is more difficult to waive the weak interpretation in favor of a generic one. Observe Example (12):

Figure 10.

In this case, the context makes it clear that one is referring to no specific and unique airplane, but one that is linked to an airplane path, although not identifiable.

4. Discussion

Both Carlson et al. (2013) and Schwarz (2012) incorporation analyses are possibly supported by the examples of weak definites in object and adjunct roles. But the presence of weak definites in subject position by itself calls into question their proposal as presented in recent works, since subjects cannot, by definition, be incorporated into VPs. Our characterization of examples similar to (12) and (6) as weak - though based on a test that has some limitations, at best. In theory, it is possible to defend another characterization (at least for cases comparable to Example (6)), and, instead, dispense with the individualization test as an adequate way to single out weak definites altogether.

In our opinion, the problem we face here does not lay solely on the test adequacy to one or other structure, but on the very subjective nature of these tests themselves. To adequately untangle different interpretations of definite expressions, and even to establish if there are actual different interpretations, we need to use measures that are more objective, and gather real world (ideally experimentally controlled) data. The work presented here is relevant because it brings real data to the table and creates at least one starting point for a deeper investigation. We believe that future experimental work with these particular subject occurrences can be important to shed light on the definiteness problem, and minimally help to discard some inadequate explanations (for instance, VP incorporation), or refine the definition of weak definites.

Further work with the corpus remains to be done. One possible avenue is to better examine the role of weak definite adjuncts within the incorporation framework. Considering that adjuncts do not take part in the verb argument structure and are optional, how are they incorporated into VPs? Are there specific kinds of adjuncts that allow this incorporation or are they linked to some verbal characteristic?

We hope to have shown that more empirical forms of investigation can be made and do bring important evidence to this type of phenomena. Furthermore, we hope to continue pursuing this line of work and stimulating others to follow it as well.


