The teaching of Portuguese as a Host Language has faced many challenges since its establishment as part of the field of Portuguese as a Foreign Language, mainly for presenting specificities unknown to the teachers and researchers in the field (GROSSO, 2010; DEUSDARÁ; ARANTES; BRENNER, 2018). Among these challenges are the methods employed to analyze the oral performance of the Host Language speakers. With this in mind, this report aims at discussing the evaluation of the oral performance of adult immigrants, beginner students of Brazilian Portuguese as a Host Language, based on two descriptive scales of the measure of Outcome Achievement, a multifaceted measure that looks, mainly, at pragmatic aspects of language use in tasks whose main focus is the communicative outcome of the performance. Even though the proposal of criteria that compose the measure of Outcome Achievement is based on the interpretation of raters, which might bring incongruences to the evaluations, the measure presents an alternative to evaluating oral performance in contrast with other more traditional measures. In conclusion, the employment of the Outcome Achievement measure to analyze oral performance in tasks to teach Portuguese as a Host Language brings aspects that are inherent to this context of teaching, taking into account the immediate objectives of language use of the immigrant population.


The teaching of Portuguese as a Host Language has increased in importance in the latest years and it became an urgent matter for research to embark on investigating the processes taking place in these classrooms. Although it is of common knowledge that the teaching of Portuguese as a Host Language can be inserted alongside other specific contexts of teaching under the umbrella field of Portuguese as a Foreign Language, the Host context brings its specificities to the discussion of how to teach and evaluate language learning, which has not been addressed by the broader area yet (JENSEN, 2002[1]; LOPES, 2009[2]; DEUSDARÁ; ARANTES; BRENNER, 2018[3]).

Among the specificities brought by the teaching of Portuguese as a Host language is a necessity for specialized teachers, for personalized teaching materials (SILVEIRA; XHAFAJ, 2020[4]), and consequently for adequate techniques and methods for analyzing the language development of the speakers in this context. These aspects have to be coherent with the students’ context of learning and, primarily, with their learning objectives. However, the tradition in the field of language learning, concerning the evaluation of language development has been strongly connected to standardized tests that look mainly at structural elements of language which might not be the most appropriate approach to the Host Language context – though some important changes have happened in the recent years (LONG, 2015[5]), with the advance of proficiency tests and scales that are more function-oriented instead of grammar-based.

Teaching Portuguese as a Host Language (PHL) involves understanding that the population in focus is composed of immigrants and refugees that have been forcibly displaced1 from their home countries and are looking for establishing in Brazil for a new life. Therefore, the immediate needs of this population concerning language learning should consider the situations of immediate adaptation to the ways of living in the new country (GROSSO, 2010[6]). In this matter, research in the field has lacked a careful consideration of how to prepare and evaluate language performance considering these populations’ immediate communication needs.

Having in mind this context’s needs for tailored syllabuses that meet the population’s communication needs and, consequently, evaluates their performance accordingly, this study aims at presenting an alternative way of assessing students' oral performance in tasks in the context of PHL for beginners in Brazil. We aim at assessing students’ oral performance not through the well-known CALF measures but adopting the measure of Outcome Achievement (based on FARIAS, 2018[7]), which is a multifaceted construct, that does not rely solely on vocabulary use, or grammatical accuracy, for instance, but instead has a strong concern for meaning and contextual elements that may dictate the adequacy of the vocabulary or grammar to be used. This article reports on the process of designing and implementing two oral tasks, and two descriptive scales for Outcome Achievement which were, in turn, designed to meet the characteristics of these two oral tasks.

This article is organized as follows. First, we present an overview of the fields of Portuguese as a Host Language and the traditions in evaluating oral language performance in tasks. Then, we describe the method employed to gather data for the study, focusing on the population’s characteristics and the steps followed to first, implement the two tasks in a classroom setting; second, design the two Outcome Achievement descriptive scales; and finally, implement it for analyzing the participants’ oral performances. To conclude, we bring the results of the performances in the two tasks and suggest further steps to investigate the measure.

1. Portuguese as a Host Language and the traditions of oral language assessment of performance in tasks

In order to analyze the proposal of a recently conceptualized measure, as is the Outcome Achievement, it is key to look at the traditions in research that have been using different measures to analyze performance, and specifically, in this case, oral language performance. To do so, it is important to look at performance having in mind the context of teaching and learning to which the measure is being proposed here. Therefore, this section is organized to, first, bring an overview of the context of Portuguese learning and teaching as a Host Language; and second, present the background of studies that inform the analyses of oral language performance.

The concept of Host Language has been frequently addressed by some researchers (GROSSO, 2010[6]; LOPEZ, 2018[8]; OLIVEIRA; SILVA, 2017[9]) to debate the recent migration movements taking place in all parts of the world. Although migration has been a common phenomenon throughout the history of humanity, some would say that the last years could be subscribed as “the biggest migration crisis of all times” (DEUSDARÁ; ARANTES; BRENNER, 2018, p. 3228[3], our translation). According to the United Nations High Commissionaire for Refugees’ (UNHCR[10]) 2018 report, 68.5 million people around the world had been recently forced from their homes, which, according to the report, represents the highest levels of displacement on record ever witnessed. Moreover, also according to UNHCR, about 85% of the world’s displaced people are in developing countries, such as Brazil.

As a consequence, this growing number of immigrants who have seen Brazil as an alternative place in which they can rebuild their lives has caused important changes in the profile of the population in need to speak Portuguese. The urge to understand the specificities of this population has been growing and it has shown the field’s need for language teaching and learning practices that fit this population best. It has been perceived that the tools available, usually used for the teaching of Portuguese as a Foreign Language, do not fit this population’s communication needs in the first moments of welcoming in the new country (DEUSDARÁ; ARANTES; BRENNER, 2018[3]).

The need to develop initiatives to assist the immigrants in the societies to which they move is urgent, once the situation of forcibly displaced people highlights necessities related to legal affairs, housing and working issues, and even more urgent matters nowadays such as access to the health systems, taking into consideration the COVID-19 pandemic. And in all these matters, language is a key aspect that may help or hinder the adaptation of forcibly displaced people to their new home.

In this context, speaking Portuguese is not only a matter of being proficient in interacting with native speakers of the target language in diverse situations; it is a matter of being proficient in solving particular and immediate necessities. According to Grosso (2010[6]), the Host Language concept is linked to the host context and the migratory context. It is generally composed by an adult audience, who might learn Portuguese “due to different contextual needs, often linked to the resolution of urgent survival issues, in which the host language has to be the link of affective (bidirectional) interaction as the first form of integration (in linguistic immersion) to a full democratic citizenship” (GROSSO, 2010, p. 74, our translation[6]).

Arantes, Brenner and Deusdará (2016) also highlight the idea that the “teaching of a host language to refugee youths and adults presents specificities that are not the same as those of foreign language teaching” (p. 1202 apud DEUSDARÁ; ARANTES; BRENNER, 2018, p. 3229[3], our translation) and due to this fact, many of the traditional approaches to teaching and learning a language might not consider aspects that are crucial to this context. One example is the fact that this population might not aim at developing the most accurate speaking skills in the short term, but instead might aim to learn enough Portuguese that could guarantee effective communication in the city hall, with the Federal Police department, with health professionals and other situations that are of an immediate necessity to them (MARCELINO, 2020[11]).

Accordingly, teachers and researchers should find alternatives to prepare the learners of Portuguese as a host language to perform in these situations and to assess their performances in order to develop their oral skills globally, but also giving special attention to those specific communication needs.

In this aspect, we have seen a growing number of studies that have addressed the teaching of Portuguese as a Host Language, presenting options of practices that allow for a focus on the communication needs of this population, as is the case of studies in the field of Task-Based Language Teaching (TBLT) (SILVEIRA; XHAFAJ, 2020[4]; MARCELINO, 2020[11]; LOPEZ, 2016[12]; 2018[8]; CURSINO; ALBUQUERQUE; FIGUEIREDO SILVA; GABRIEL; ANUNCIAÇÃO, 2016[13], to cite a few recent studies).

What we need to know about TBLT now, considering the purposes of this article, is that at its core is the idea that people use language to plan, perform and recall the activities done in the “real-world” (LONG, 2015[5]) – meaning the world outside the classroom walls. The language classroom, therefore, assumes this functional perspective for the use of language bearing in mind the communicative purpose/objective of the interactions that take place in the everyday life. According to Long (2015[5]), the tasks are the units of “analysis throughout the design, implementation, and evaluation” (p. 6) of a language course, and consequently, its evaluation should be coherent to the kind of tasks students are required to perform in class and, at the same time, outside the classroom.

Moving on to looking at oral tasks in the field of TBLT, there has been a preference for analyzing performance through what is called the CALF measures (which stand for Complexity, Accuracy, Lexical density, and Fluency) (FOSTER; SKEHAN, 1996[14]; PALLOTTI, 2009[15]; SKEHAN, 1996, 1998, 2003[16-18]). These measures are applied to analyze oral performance usually by means of transcribed speech and computer programs in charge of the counting of subordination in the sentences, number of errors per clause, number and variety of words used, and number of pauses and false starts, to cite a few possibilities. However, some recent studies have questioned the idea that to perform successfully in an oral task the speech produced has to be complex, accurate, lexically rich, and fluent (REVESZ; EKIERT; TORGERSEN, 2016[19]). More specifically, research that has had its onset in pedagogical contexts has focused on the learners’ immediate communication needs and has claimed that to perform tasks successfully, one’s main concern is to communicate the message and fulfill that need regardless of some violation of the language rules that might appear in these learners’ speeches.

In this sense, the pragmatic aspects of language play a more important role in the communication – and consequently in the performance – than the structural elements of language, such as the adequate use of grammar. We understand that these pragmatic elements are related to the “conditions governing the language use, the linguistic practice” (FIORIN, 2003, p.161[20], our translation) and how these conditions affect the linguistic choices made by learners. In our point of view, the pragmatic elements of the enunciation are not captured by the CALF measures, such as the people, the time, and the space in which the communication happens. Therefore, there is a need to investigate other assessment tools that consider these aspects when determining communicative success, for instance.

In the field of TBLT, Pallotti (2009[15]), as well as Robinson (2001[21]), have already highlighted the fact that communicative success and adequate communication have been little investigated in the field and, consequently, suggestions of alternative measures have been put under scrutiny, such as adequacy and outcome achievement.

The measures of adequacy and outcome achievement are known to assess language performance taking into consideration the context in which the language is being used and the interlocutors the speech addresses, for instance. According to Farias (2018[7]), both constructs are very similar and may overlap depending on the nature of the task to which it is applied. Pallotti (2009[15]) states that Adequacy “represent[s] the degree to which a learners’ performance is more or less successful in achieving the task’s goals efficiently" (p. 596), while Farias (2018[7]) explains that “differently from Adequacy, that puts attention on examining if language was used adequately for communicative purposes, Outcome Achievement is focused on investigating if the expected outcome of a task was accomplished” (p. 69, our emphasis).

Adequacy and outcome achievement are constructs that place an important part of successful communication in conveying the appropriate genre of communication that the tasks require, by agreeing with the contexts in which the tasks are inserted, for instance. And, due to the fact that contexts of language use entail different aspects of language production, the alternative to assess performance in these terms is to adopt descriptive scales (PALLOTTI, 2009[15]) tailored to the tasks learners are required to perform and that consider the pragmatic components of language in use (FIORIN, 2003[20]).

Pallotti (2009[15]) explains that one example of measure scales to rate language production is the well-known Common European Framework of Reference for Languages: Learning, teaching, assessment, that “presents a comprehensive descriptive scheme of language proficiency and a set of Common Reference Levels (A1 to C2) defined in illustrative descriptor scales…” (COUNCIL OF EUROPE, 2020, p. 27[22]). And in the Brazilian context, there is the Exam for the Certification of Proficiency in Portuguese (CELPE-BRAS), which claims to be “based on the idea of proficiency as the adequate use of language to conduct actions in the world. The exam considers textual elements and, mainly, discoursive elements: the context, the purpose and interlocutors involved in the interaction.” (MINISTÉRIO DA EDUCAÇÃO, n/d, our translation[23]).

Although the descriptions of the measures of adequacy and outcome achievement resemble CELPE-BRAS’s idea of proficiency, the oral part of the exam presents limitations for assessing learners in the host language context of learning. The main limitation being represented, in our point of view, by the fact that the only genre of language use available to assess oral performance in the exam is to have a conversation or interaction2 about a topic chosen by the interlocutor-evaluator.

Our experience, as language teachers and researchers has shown that immediate beginners are not yet equipped with the language proficiency necessary to maintain a five-minute conversation about a topic they might not be acquainted with. And that the descriptive scale designed to assess oral performance according to this genre in the exam, might not be the best fit to assess performance according to the immediate communicative needs of learners in the context of the host language. In this context, equipping the learners to perform in specific focused-tasks might be more urgent. In order to illustrate, we can think of the learners being able to communicate the symptoms of a disease to a health professional, as a task that is very urgent for learners in the context of the host language.

We understand that every proficiency test presents limitations and are designed with different objectives in mind, and that the descriptive scales provided by the tests are many times used to guide curriculum development and even to guide teachers’ choices in language classrooms. Therefore, considering the latter, scales such as CELPE-BRAS and CEFR present a comprehensive description on how to assess performance, and might not be a straightforward source for teachers designing and implementing their assessments in the classroom setting of Portuguese as a host language, specially for beginner learners who need to fulfill their immediate communication needs, irrespective of making language mistakes.3

Having considered the different approaches to analyze oral performance in the field of Task-Based Language Teaching, the following section describes the steps involved in the data collection and instruments this article presents.

2. Method

In order to report the process of assessing students' oral performance in the context of PHL for beginners in Brazil, by adopting the measure of Outcome Achievement (based on FARIAS, 2018[7]), a construct developed in the field of Task-Based Language Teaching that looks at the adequate performance in a task through a pragmatic perspective, in this section, first we describe the context of teaching at which the oral tasks were aimed and follow by presenting the steps taken to create the two scales of Outcome Achievement. Then, we present the procedures for the implementation of the measure to analyze these students’ oral performances4.

First of all, we present the context of teaching on which this study focuses, which comprises a group of students (17 in total) beginner speakers of Brazilian Portuguese, who were immigrants living in Florianópolis – Santa Catarina around the first semester of 2019 and who had just started taking language classes at an extension project of the Universidade Federal de Santa Catarina (UFSC), entitled Projeto PLAM.5

This group of learners was composed of one Iranian, one Jordanian, two Venezuelans, and 13 Haitian immigrants. The group is far from being considered homogenous, once they vary in sex (10 men, 7 women), nationality (Iranian, Jordanian, Venezuelan, and Haitian), and mainly in age range (18 to 76 years old), schooling (a few learners have college degrees, while some others have not completed secondary school) and language backgrounds (while a few students were monolinguals, others spoke two or three languages)6.

PLAM project offers classes once-a-week, for three hours, on Saturday mornings. The classes are free of charge for the students and count on the participation of volunteer teachers and teacher assistants. The project usually offers one class for immediate beginner speakers and one class for students who are higher-beginners to intermediate speakers of Portuguese.

The data collection reported in this article took place in two of those classes, in which students were presented to two cycles of tasks, as part of their regular language classes. Therefore, in the first class, the students participated in the class regularly and performed pedagogical pre-tasks designed to prepare students to perform the final target task (Figure 1). Not all the students present in the class accepted to participate in the study and some did not complete the activities in class, which was a requirement to have their performance included in the analysis7. This explains why the number of participants is different in the two tasks, once data collection took place in a real classroom setting, the researchers had to rely on students’ attendance and their agreement to participate in the lessons’ activities and, finally, with students’ willingness to record their answers.

The cycles of tasks to which the students were presented before the recording tasks were designed taking into consideration the students’ goals for Portuguese language use in their daily-lives outside school8 and aimed at preparing these students to perform adequately in the final target tasks.

The first task required students to identify three different images of the flu symptoms and record their answers to the task (with their phones or with an audio recorder provided by the researcher9). The rubrics of the task were: “Daniel chegou no Brasil há uma semana e ainda não fala português. Ele está doente e você vai com ele ao posto de saúde. Explique para o enfermeiro os sintomas de Daniel.”; as presented in FIGURE 1.

Figure 1.Task 1 (MARCELINO, 2020).

The second task required students to choose an occupation in which they could work, and state previous working experiences that related to the occupation chosen. Once again the students were required to record their answers, this time, to the task: “Grave uma mensagem de Whatsapp oferecendo seu trabalho e falando sobre suas experiências nessa profissão”, as presented in Figure 2.

Figure 2.Task 2 (MARCELINO, 2020).

In order to assess the students’ performances in these two tasks, two descriptive scales were designed according to the measure of Outcome Achievement. The two scales followed the same rationale, considering the aspects that were important to achieve each task’s communicative outcome. The communicative outcome of the first task was to communicate the symptoms of the flu to a health professional, while the communicative outcome of the second task was to choose an occupation to offer to a colleague/friend and state the previous working experiences associated with the occupation wanted. Thus, because the two tasks presented different characteristics, one of the features that composed the measure of Outcome Achievement was adapted to comprise the specificities of each task.

The first task presented a closed-outcome, meaning that learners’ responses could be characterized as correct or incorrect based on the matching of their responses with the visual support provided. The second task presented an open outcome that required justification, meaning that students’ responses could only be characterized as correct or incorrect depending on the justification provided to their choice for occupation. This characteristic impacted on one of the features of the Outcome Achievement measure, which had to be adapted accordingly. As it is possible to see in Table 1, while in the first task there is only one feature for Correctness, the second task is composed of Correctness A and Correctness B, because of this two-factor component of the answer.

I. Communicative objective It aimed at giving a general score for the performance, depending on whether the student achieved the task’s communicative objective or not.
II. Correctness (for Task 1) Correctness A Correctness B (for Task 2) It was specific to each task and concerned whether the students’ responses were correct, in Task 1, according to the images they should describe (Correctness), and in Task 2 if a student’s alleged professional experience (Correctness B) matched with the position they mentioned they were searching (Correctness A).
III. Communicative context It comprised giving indications of the situation of interaction, the place, and the people involved. For example, the response should have contextual clues that the genre of the communication in task 2 – recoding a Whatsapp message to a colleague/friend – characterizes mainly as an informal situation, that implies more colloquial language use;
IV. Coherence It dealt with the level of connection of the ideas conveyed by the students, “whether they followed a well-structured sequence of events in their speech, for instance, by first introducing Daniel (the fictitious character in Task 1) and then stating his health problem and not the other way around” (MARCELINO, 2020, p. 59);
V. Clarity It comprised the transparency of the meanings being conveyed by the students.
VI. Prosody In this analysis, it stands for the suprasegmental features of speech: intonation, rhythm, and speed.
VII. Vocabulary It aimed at analyzing if the words used in the speech were sufficient and adequate to achieve the Communicative outcome of the tasks.
Table 1.Description of the features the composed the measure of Outcome Achievement.

To create the two scale, the teacher of the class, one volunteer teacher assistant of the same group, along with the first author of this article, analyzed the cycles of tasks to which the students would be presented before performing the task, once the cycle of tasks was designed to prepare the students to perform in these specific tasks. Then they analyzed and raised the key characteristics of each task, and finally, considering the students’ profile – immigrant beginner learners of Brazilian Portuguese – they settled seven features (shown in Table 1) that should be taken into consideration for the analysis.

The features’ descriptions seen in TABLE 1 were inspired by the studies of Farias (2014, 2018[7,24]), Zaccaron (2017[25]), Specht and D’Ely (2020[26]), and Lima Terres, Torres, and Boeing Marcelino (2020[27]), which have all devised descriptive scales for assessing English learners’ performances in tasks in different modes. While Farias (2014, 2018[7,24]) worked with written narratives, Specht and D’Ely (2020[26]) worked with oral narratives. In turn, Zaccaron, Xhafaj, and D’Ely (2019[28]), Zaccaron (2017[25]) and Lima Terres, Torres, and Boeing Marcelino (2020[27]) assessed oral short messages recorded through means of the Whatsapp mobile application.

In summary, in this study the Outcome Achievement measure comprised seven features implemented to analyze students’ oral performance using Likert scales that attributed scores from 0 to 5 (in which 0 was the lowest score and 5 the highest). In the first task 15 learners participated by recording their response, while in the second task, only 12 learners participated. The Outcome Achievement of the learners’ answers was analyzed by nine raters. Five of these raters were experienced Portuguese as Additional Language teachers with an average teaching experience of 6.8 years. These five raters had also had experience teaching other languages such as English, German, and Japanese. The other four raters had an average teaching experience of 9.5 years, teaching English as a Foreign Language (FL). All of the nine raters had a Teaching and/or Bachelor’s degree in Letras (English FL and/or Portuguese Mother Language (ML). All of them were native speakers of Brazilian Portuguese.

Each of the raters’ analyses took about 2 hours to be completed. The sessions of analyses were accompanied by the first author of this article, who would explain the overall objectives of the study and present the raters with the procedures. The decision to have the researcher present throughout the sessions was made to assure that all raters would have the same or, at least, very similar conditions of evaluation.

First, raters read an instructional text in which concepts such as the Communicative Outcome of a task were explained. Then, the raters could read the descriptive scale of the tasks and read the cycles of tasks that were taught before the performance, followed by a review of the descriptive scale of each task10. Raters could solve with the researcher any questions they had concerning the procedures for their evaluation or the constructs involved. The researcher clarified that there was not a correct or incorrect evaluation of the performances and that each rater evaluation would not be judged or compared with other raters’ evaluations11. Finally, when the rater had no further questions about the proceedings, the researcher would play the first recording twice, wait for the rater to provide scores and justification, and proceed to the following recording.

In summary, raters could listen twice to each recording and then give a score from 0 to 5 to each feature of Outcome Achievement. In addition to giving a score from 0 to 5, raters should also justify their choice for all the scores given to each student in each feature. Figure 3 illustrates the instructions and the descriptive scale of Task 1, to which the raters responded.

Figure 3.Descriptive scale of Outcome Achievement for Task 1 (MARCELINO, 2020).

As seen in Figure 3, raters had the option of creating one further feature for analyzing the students’ performances; however, none of them suggested a topic of evaluation that was not already comprised by the predefined features presented.

After evaluations were completed, in order to check for interrater reliability, we ran a Cronbach’s alpha test on SPSS (23.0). According to Taber (2018[29]), reliability tests demonstrate “the extent to which an instrument can be expected to give the same measured outcome when measurements are repeated” (p. 1274). Also according to him, many studies in science education have long adopted Cronbach’s alpha value as an indication that instruments (usually scales and tests) are fit for their purpose. The tests ran with raters’ raw data of Tasks 1 and 2 indicated a robust result once Cronbach’s alpha value for Task 1 was .802 and for Task 2 was .822 as shown in Figure 4, assuming that the Cronbach’s alpha benchmark traditionally used in the field is .70 - .80 (LARSON-HALL,2016[30]). These results showed a strong correlation between raters’ answers.

Figure 4.Interrater reliability tests for Tasks 1 and 2 (MARCELINO, 2020).

Regarding the analysis of each learner’s performance and the group mean values in each feature of Outcome Achievement, the procedures are explained in the following section, alongside the tables presenting all the results of the raters’ assessment.

To conclude, the following section presents the results of the students’ oral performances to tasks assessed through the Outcome Achievement measure scales, designed according to the characteristics of the tasks presented in Figure 1 and Figure 2.

3. Results and discussion

This section presents the results from students’ recorded answers to the two tasks, and discusses these results in light of previous studies, bringing possible impacts to understanding speaking performance in the Host Language context.

As mentioned previously, the students’ oral performances were evaluated by nine raters according to the Outcome Achievement measure. In Task 1, 15 students recorded their answers.

In order to achieve the score for each learner in each feature, all the scores attributed by the nine raters to each of the seven features of Outcome Achievement were added and divided by nine, the results of this calculation are presented in TABLE 3. Then the 15 scores of all learners for each feature were added, and finally divided by 15, to achieve the group’s mean value to each feature of Outcome Achievement, presented in TABLE 2 below.

Feature N M Min Max SD SEM
Communicative Objective 15 3,41 1,56 5,00 1,03 0,27
Correctness 15 3,62 0,67 5,00 1,46 0,38
Context 15 3,33 1,78 5,00 0,93 0,24
Coherence 15 3,47 1,67 4,89 1,00 0,26
Clarity 15 3,67 0,78 5,00 1,13 0,29
Prosody 15 3,79 1,44 4,67 0,85 0,22
Vocabulary 15 3,34 1,56 4,67 0,92 0,24
N: Number of participants; M: mean scores of all 15 participants attributed by the nine raters’ mean score for each participant; Min: Minimum scores among the 15 participants; Max: Maximum scores among the 15 participants; SD: Standard Deviation of the dispersion of scores in comparison to the Mean scores; SEM: Standard Error of the Mean
Table 2.Descriptive Statistics of group scores for performance in Task 1 (MARCELINO, 2020).

Considering that raters evaluated students’ performance in a six-point scale, from 0 to 5, where 0 was the lowest and 5 the highest score that could be attributed by each rater, the group obtained, in Task 1, a mean score higher than 3 in all seven features of Outcome Achievement, as presented in TABLE 2. The Standard Deviation values show the dispersion of the scores compared to the Mean which indicates that students’ scores fluctuated around the mean and are spread around diverse values (LARSON-HALL, 2016[30]). This is mainly explained by the fact that the score range used to analyze the performance is small (0 to 5). The Standard Error of the Mean (SEM), also presented in the table, stands for the standard deviation of the sample mean and indicates that the Mean estimate is highly precise since according to Larson-Hall (2016) “the smaller the SE is, the more precise the estimate” (p. 84). In Task 1, the feature with the highest mean score was Prosody and the lowest mean score was Context (3,79 and 3,33 respectively), as shown in TABLE 2.

It is interesting to note that when comparing the minimum and maximum scores of the participants for Task 1, five out of the fifteen participants (namely, P1, P5, P6, P7, and P11 in TABLE 3) received maximum scores for some of the features, while only two out of fifteen (P9 and P15) shared the minimum scores. This means that the number of participants who did well in the task outnumbered considerably the participants who did not perform the task satisfactorily.

Participant 6 is one of the students who received the highest scores for Outcome Achievement; therefore, according to the raters, her performance is an example of what would be an adequate speech that aimed at communicating the flu symptoms to a health professional: “Bom dia! Eu e-eu estou aqui porque meu filho Daniel está sentindo problemas de saúde. Ele tem dor de garganta, muita segreçã-segregação nasal e se siente muito cansado. Eu estoy muito assustada”. This performance received the highest grade for the Context feature, taking into consideration that she even created a story to Daniel’s character by saying he was her son. However, raters penalized her, mainly for her trouble in pronouncing the symptom “secreção nasal”. It is interesting to trace a comparison between the performances of Participant 6 and Participant 5, because, among the learners who performed task 1, Participant 5 received the highest score for Outcome Achievement, even though one could argue that his answer was a lot simpler than Participant 6’s answer for instance. Participant 5’s answer was: “O Daniel tem febre, dor de cabeça e dor de garganta”. This might be explained by the fact that, possibly, raters evaluated more highly performances that were free of errors, instead of performances that were more creative or detailed, as can be seen in TABLE 2 that Participant 5 received a total score of 33,67, while Participant 6 received a total score of 32,44.

Figure 5.Table 3 - Students' individual scores by feature for Task 1 (MARCELINO, 2020).

In the opposite end of the scale was Participant 15, who received the lowest mean scores for most features and consequently the lowest score for Outcome Achievement. His answer was as follows: “Daniel [inaudible] doctora. Febre, dor de garanta e se tá + é, ahm, + é segre-ção nasal”. This participant did not mention the correct symptoms and had great difficulty in communicating it clearly, which is reflected in his lowest score for Clarity (see TABLE 3). Raters justified their choices by saying that this performance was filled with pauses between sentences and in the middle of words, resulting as well in a bad score for Prosody, for instance. All these aspects hampered the comprehension of the entire message and resulted in a poorer outcome.

Moving on to analyzing performances in Task 2, this time only 12 students recorded their performances. As shown in TABLE 4, one more time, the mean scores of the entire group were higher than 3 for most features of Outcome Achievement, except for the Correctness A feature. Standard Deviation values were low, which indicated a low variance of scores throughout the scale. The lowest and highest means were Correctness A and Prosody (2,97 and 3,44, respectively).

Figure 6.TABLE 4– Descriptive Statistics of group scores for task performance in TASK 2 (MARCELINO, 2020).

Task 2, as well as Task 1, presented a higher number of participants that received the maximum scores (P4, P5, P8, P9, and P12) in comparison with few participants sharing the minimum scores (P7 and P10, see TABLE 5).

Figure 7.TABLE 5 - Students' individual scores by feature for Task 2 (MARCELINO, 2020).

Participant 8 received the best evaluation for Outcome Achievement. In his answer: “Eu bom para vaga comércio porque minha experiências com comércio de três anos. É, eu, é, estudei Universidade em Língua e, e Comercial também. Eu gosto também desse trabalho”, raters justified their scores by saying the student was successful in presenting an occupation clearly and in connecting it adequately to the previous working experience mentioned, which is reflected in his good scores for Correctness A and Correctness B. Although the raters said they needed to put some effort to understand his message, which is reflected in his low score for Clarity, the communicative objective of his message was attained almost completely if we look at the Communicative Objective feature of his performance.

In turn, although Participant 7 received the lowest score for Outcome Achievement, her performance in some ways is not comparable to the performance of the other participants. This is explained by the fact that, in class, she was aided by a teacher assistant and a colleague, to complete the recording. Therefore, although some raters considered that Participant 7 was able to pronounce some of the words spelled out by others, other raters considered that once she did not perform by herself she did not achieve the outcome of the task, resulting in the lowest score given to a student in this task.

The second lowest score was attributed to the performance of Participant 10, whose answer was as follows: “Oi, ahm, como você, a, como você está? Tudo bem? Mi profi-mi profissão é cozinheiro. Eu tenho experiência de quatro anos, ok?”. According to the raters, although Participant 10 started his performance well by “setting the mood” and greeting the interlocutor, which gave him a good score for Context, they agreed that the learner had not achieved a clear communication of the occupation neither of the experience related to it, once he only mentions the time of his experience. This is reflected in the low scores attributed mainly to Clarity, and Correctness A and B.

Furthermore, once we analyzed students’ performances and learned what raters considered and penalized the most, it is interesting to notice the features with the highest and lowest evaluations. We believe that fruitful discussions can be raised from the fact that, in Task 1, Context received the lowest mean score, considering the group’s mean (3,33). We understand that this feature of the Outcome Achievement aimed at evaluating whether the situation of interaction was present in students’ performances, meaning that the students’ choices of language could provide contextual clues of the place where the interaction happened and who were the people involved, which is related to the pragmatic features of language (FIORIN, 2003[20]), so important in this context of teaching and learning. Unsuccessful performances considering this feature might indicate that students were not made aware of the importance of context in the interaction, and how it may affect the language used to communicate their messages.

In Task 2, the feature that received the lowest mean score was Correctness A (2,79), and interestingly, it was not accompanied by Correctness B (3,26), once the two were expected to be connected. It is possible to suggest that students preferred to describe their work experiences in more detail, leading raters to judge the correspondence with the job wanted based on the experiences described. Most of the times, when the experience did not match the job chosen, or students described solely their past working experiences, it was Correctness A that was penalized, as is the case of Participant 3: “É, eu sou, eu sou secretária al-[inaudible] alfânde-alfândega. Experiência em saúde, [inaudible] enfemela, hum de posto de saúde já. Ahm, dois, experiência em dois anos”. This performance showed that the learner chose an occupation that would not be typically connected to her experience as a health professional.

On the other hand, looking at the feature that received the highest scores for the groups’ mean in both Task 1 and Task 2, we see Prosody (scoring 3,79 and 3,50, respectively). Prosody was the given name for three suprasegmental features considering speech: intonation, rhythm, and speed. On one hand, it could be argued that the highest scores this feature received are connected to the fact that both tasks allowed students to achieve an adequate outcome by using short sentences or even isolated words, like in Task 1, in which considering a “real-world” situation of communication, arriving at a health center and naming isolated symptoms could lead to successful communication, considering its objectives. An interesting further analysis would consider investigating whether there is a correlation between Prosody as an Outcome Achievement feature and the measure of Fluency from the CALF measurements.

To conclude, we should remember that the nine raters, even being all from the same field, native speakers of Portuguese, and being language teachers, diverged in their evaluations. This fact might yield two possible discussions: first of all, the use of raters to validate pedagogical practices might not be adequate as well, since the process of learning is not determined by only one single performance, but is permeated through teaching and learning practices inside classrooms, of which outside evaluators might be unaware. Second, the fact that the components of Outcome Achievement are subjective according to each rater concept of successful communication is an issue that deserves further investigation.

Measures such as the one used in this article to assess oral performance have been just recently included in the field of performance analyses in TBLT and there are a growing number of studies that have chosen to take Outcome Achievement, or Adequacy, in contrast with other well-established measures such as CALF. We agree with Pallotti (2009[15]) that Outcome Achievement, or Adequacy, “should be seen as both an independent construct based on task success and as a way of interpreting CAF measures” (p. 599), once, for example, Context is relevant for determining the adequate level of accuracy expected of a message.

In this article, it was seen that the differences between raters analyses of participants’ performances were based on the different proceduralization of the Outcome Achievement measure, indicating that although the measure prioritizes the achievement of the communicative outcome of the tasks, raters’ evaluation is influenced by their perception of fluency and accuracy, reflected in the scores attributed to the performances’ lack of errors, smooth or intricate speech, for instance. This reinforces that this evaluation, as well as others, might be impacted by some degree of subjectivity. Although raters equally understood the features, they presented different evaluations of what was an adequate performance for each of them. This might indicate the importance of having dual-task research designs that comprise a cognitive approach and, as well, outcome fulfillment, as suggested by Skehan (2003). This dual-task design might illuminate the relationship between rater’s subjective perceptions of fluency and accuracy, for instance, and allow for the correlation of their perceptions with measures that provide straightforward scores.

4. Final remarks

The growth in the field of teaching and learning of Portuguese as a Host Language, due to an increasing number of immigrants who find in Brazil a place to rebuild their lives, once many have been forcibly displaced from their previous homes, has made urgent the need to understand more deeply the specificities of this context (GROSSO, 2010; SILVEIRA; XHAFAJ, 2020, MARCELINO 2020). One of the specificities of this context is the inadequate use of traditional measures concerning oral performance. Mainly in the field of Task-Based Language Teaching the tradition to analyze oral performance has been to adopt the CALF measures (Complexity, Accuracy Lexical density, and Fluency), once these measures are believed to be part of and influence the communicative competence of speakers (FOSTER; SKEHAN, 1996; SKEHAN 1998; 2003). We know that the number of studies showing the efficacy of the CALF measures is a lot higher than the number of studies critiquing them. However, we reiterate that assessing only complexity, accuracy, lexical density and fluency might not be enough to determine if learners are being successful or not in their communication (PALLOTTI, 2009, REVESZ; EKIERT; TORGERSEN, 2016).

Moreover, considering the specific context of the Host Language addressed in this article, in which speakers are mostly focused on solving one specific problem of communication or using language adequately enough to “get one’s message across”, adopting the CALF measures to assess oral performance might be inadequate. In this context, this study aimed at presenting an alternative way of assessing oral performance in tasks that do not look at the CALF measures but that takes into consideration the Outcome Achievement measure, which is a multifaceted construct, that does not rely solely on vocabulary use, or grammatical accuracy, for instance, but instead has a strong concern with meaning and contextual elements, that may dictate the adequacy of the vocabulary or grammar to be used. To do that, 17 adult immigrants, beginner speakers of Brazilian Portuguese performed two oral tasks and had their performances evaluated by nine raters, native speakers of Brazilian Portuguese and experienced language teachers.

These students obtained different scores for Outcome Achievement, reflected in the varied scores attributed to each of the seven features that compose the measure. These seven features were thought to comprise the pragmatic aspects of language (FIORIN, 2003) involved in a situation of communication such as the ones presented in the two tasks investigated here. We believe that this article supports the idea that the process of understanding the communicative needs of the speakers, designing focused tasks according to their needs, and evaluating the speakers’ performances according to the features of these specific tasks is very challenging and can be a complicating factor when a teacher in the classroom has to take decisions on his/her own about how to evaluate the students.

In this sense, the field of TBLT seems to shed light on what to prioritize in the moment of evaluation, and on what to prioritize during the preparation of these students to perform a task, once in the context of the Host Language especially, issues related to the pragmatic elements of language (FIORIN, 2003) are fundamental.


We would like to thank the students who participated in the master’s thesis on which this article was based, the teacher and teacher assistants who were present and aided the challenging process of data collection; as well as the editors and reviewers of the magazine for their valuable contributions to the format and content of this article. Furthermore, we would like to acknowledge that the masters’ thesis on which this article was based was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.


  1. COUNCIL OF EUROPE. Common European Framework of Reference for Languages: Learning, teaching, assessment – Companion volume. Strasbourg: Council of Europe Publishing; 2020.
  2. CURSINO C, et al. Português Brasileiro para Migração Humanitária (PBMIH): reflexões linguísticas e pedagógicas para o ensino de PLE em contexto de migração e refúgio. Cursos de português como língua estrangeira no Celin-UFPR: práticas docentes e experiências em sala de aula. Ed. UFPR: Ed. UFPR; 2016.
  3. DEUSDARÁ B, ARANTES P. C. C, BRENNER A. K. “É um problema de todo mundo”: conceitos, métodos e práticas no ensino de português para refugiados. Fórum Linguístico. 2018; 15(3)DOI
  4. FARIAS P. F. TASK-TEST: What lies beyond implementing a Task-Based assessment: Comparing learners’ performance and unveiling learners’ perception in a testing situation. 2014. Dissertação (Mestrado em Inglês: Estudos Linguísticos e Literários) - Centro de Comunicação e Expressão da Universidade Federal de Santa Catarina, Florianópolis, 2014.
  5. FARIAS P. F. Critical EFL and Critical Literacy: the impacts of designing and implementing a cycle of tasks in a public school setting for Critical Language and Critical Literacy development. Dissertação (Doutorado em Inglês: Estudos Linguísticos e Literários) - Centro de Comunicação e Expressão da Universidade Federal de Santa Catarina, Florianópolis, 2018.
  6. FIORIN J. L. Pragmática. Introdução a Linguística II: Princípios de análise. Editora Contexto: Editora Contexto; 2003.
  7. FOSTER P, SKEHAN P. The influence of planning and task type on second language performance. Studies in Second Language Acquisition. 1996; 18(3)
  8. Grosso Maria José Dos Reis. Língua de acolhimento, língua de integração. Revista Horizontes de Linguistica Aplicada. 2011; 9(2)DOI
  9. JENSEN J. A Problemática das Variações Sociolinguísticas no Ensino do Português como Língua Estrangeira (PLE). Revista de Letras. 2002; 1(24)
  10. LARSON-HALL J. A Guide to Doing Statistics in Second Language Research Using SPSS and R. Routledge: New York; 2016.
  11. LIMA TERRES M, TORRES M. C, BOEING MARCELINO A. F. O impacto do planejamento colaborativo de uma tarefa oral na performance de aprendizes no contexto de EFL.. Domínios de Lingu@gem. 2020.
  12. LONG M. Second Language Acquisition and Task-based Language Teaching. John Wiley and Sons: West Sussex; 2015.
  13. LOPES J. H. Materiais didáticos de Português para Falantes de Outras Línguas: do levantamento de produções brasileiras a uma nova proposta. Formação de Professores de português para falantes de outras línguas. EDUEL: EDUEL; 2009.
  14. LOPEZ A. P. A. Subsídios para o planejamento de cursos de Português como Língua de Acolhimento para imigrantes deslocados forçados no Brasil. Dissertação (Mestrado em Linguística Aplicada) – Faculdade de Letras, Universidade Federal de Minas Gerais, Belo Horizonte, 2016.
  15. Lopez Ana Paula de Araújo. Portuguese as a Welcoming Language for forcibly displaced immigrants in Brazil: some principles for teaching in light of Interculturality. Revista Brasileira de Linguística Aplicada. 2018; 18(2)DOI
  16. MARCELINO A. F. B. TBLT and Portuguese as a Host language: analyzing learners’ oral performance in terms of Outcome Achievement and investigating the task implementation process through the learners’ and the teacher’s perspectives. Dissertação (Mestrado em Inglês: Estudos Linguísticos e Literários) - Centro de Comunicação e Expressão da Universidade Federal de Santa Catarina, Florianópolis, 2020.
  17. MARCUSCHI L. A. Gêneros textuais: configuração, dinamicidade e circulação. Gêneros textuais: reflexões e ensino. Kaygangue: Kaygangue; 2005.
  18. MINISTÉRIO DA EDUCAÇÃO. Certificado de Proficiência em Língua Portuguesa para Estrangeiros (Celpe-Bras). Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. 2020.
  19. OLIVEIRA G. M, SILVA J. I. Quando barreiras linguísticas geram violação de direitos humanos : que políticas linguísticas o Estado brasileiro tem adotado para garantir o acesso dos imigrantes a serviços públicos básicos?. Gragoatá. 2017; 22(42)DOI
  20. Pallotti G.. CAF: Defining, Refining and Differentiating Constructs. Applied Linguistics. 2009; 30(4)DOI
  21. Revesz A., Ekiert M., Torgersen E. N.. The Effects of Complexity, Accuracy, and Fluency on Communicative Adequacy in Oral Task Performance. Applied Linguistics. 2014. DOI
  22. ROBINSON P. Task complexity, cognitive load, and syllabus design. Cognition and Second Language Instruction. 2001. DOI
  23. Silveira Rosane, Xhafaj Donesca C. Puntel. The use of tasks in the teaching of Portuguese as a Second Language. Revista Linguagem & Ensino. 2020; 23(2)DOI
  24. SKEHAN P.. A Framework for the Implementation of Task-based Instruction. Applied Linguistics. 1996; 17(1)DOI
  25. SKEHAN P. A Cognitive Approach to Language Learning. Oxford University Press: Oxford; 1998.
  26. SKEHAN P. Review article Task-based instruction. Language and Technology. 36DOI
  27. Specht André Luís, D'Ely Raquel Carolina Souza Ferraz. Enhancing strategic planning through strategy instruction: the effect of two types of strategy instruction on learners’ oral planned performance. Ilha do Desterro A Journal of English Language, Literatures in English and Cultural Studies. 2020; 73(1)DOI
  28. Taber Keith S.. The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education. Research in Science Education. 2017; 48(6)DOI
  29. UNHCR. Figures at a glance. 2018.
  30. Zaccaron Rafael, Xhafaj Donesca Cristina Puntel, D'Ely Raquel Carolina de Souza Ferraz. “Só mais um minutinho, teacher”: planejamento estratégico colaborativo e individual para tarefas orais em L2 em uma escola pública. Ilha do Desterro A Journal of English Language, Literatures in English and Cultural Studies. 2019; 72(3)DOI
  31. ZACCARON R. The More The Merrier (?): the impact of Individual and Collaborative Strategic Planning on Performance of an Oral Task by Young Learners of English as an L2 in Brazil. Dissertação (Mestrado em Inglês: Estudos Linguísticos e Literários) - Centro de Comunicação e Expressão da Universidade Federal de Santa Catarina, Florianópolis, 2017.