Annotating alphanumeric expressions in clinical narratives

Carlos Antônio de Souza Perini,
Ana Luisa dos Anjos Resende Guimarães


This paper aims to analyze expressions containing numerals in clinical narrative texts in order to identify potential challenges for their computational processing and to elaborate guidelines for their annotation according to the Universal Dependencies (UD) project. Using a corpus of 1,000 clinical narratives, tokens composed of at least one numeral in numeral format were selected and classified according to the format of their presentation and their eventual annotation following the UD guidelines. Occurrences of tokens belonging to the ten most frequent classes in the corpus were studied and guidelines for the annotation of these classes were elaborated. These guidelines were recorded and will later be used to compile an annotation guide for clinical narrative treebank projects.


