Analysing nominal phrase contexts for the automatic extraction of linguistic and lexicographic data

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/68562

Título:	Analysing nominal phrase contexts for the automatic extraction of linguistic and lexicographic data
Outro(s) título(s):	Análise de contextos nominais para extração automática de dados linguísticos e lexicográficos
Autor(es):	López Iglesias, Nerea
Orientador(es):	Dias, Idalete Domínguez Vázquez, María José
Palavras-chave:	Adjectival modification Argument structure Lexical function NLG Nominal phrase Estrutura argumental Frase nominal Função léxica Modificação adjetival GNL
Data:	2020
Resumo(s):	One of the main problems in the lexicographical work is the difficulty in finding representative examples of the real language for its inclusion in the dictionary entries. Examples taken from linguistic corpora, although they are real language products (written or oral), are not often transparent enough for the users. The task of finding the desired examples in corpora is not always easy, especially in the case of valency dictionaries, as they need very specific examples that show the valency scheme. In recent years, some shortcomings have been observed in such tools, such as the absence of corpora with a large textual volume that are semantically annotated. This handicap is at the origin of a series of projects which, within the Natural Language Generation (NLG) field, aim to develop tools for the automatic generation of language, with the goal of achieving that these tools produce examples as close as possible to the real language. This dissertation aims to contribute to the development of two tools created in the scope of two projects of this nature: the MultiGenera and MultiComb projects. Currently, these tools work with a total of ten nouns, for which nominal phrases formed by the noun and one or more argument structures can be generated. To improve the tools, however, it is also necessary to work on including adjectives that function as attributes (i.e. non-actantial) of these nouns, in order to get the tools to produce examples closer to natural language and that are suitable for the purpose of the resources, which are conceived as useful tools for automatically generating examples for the multilingual dictionary of noun valency PORTLEX. For this reason, the present dissertation aims at analysing and classifying the most frequent adjectives that appear as attributes of three nouns in Spanish: dolor (pain), olor (smell, odour) and discusión (discussion). The analysis of adjectives will be both quantitative (with attention to their frequency of use) and qualitative, with the accomplishment of a classification that has as main axis the concept of lexical function (LF). A classification according to LFs will make it possible to create adjectival semantic packages that can be incorporated into automatic generation tools. Um dos problemas principais no trabalho lexicográfico é a dificuldade para encontrar exemplos representativos da língua real para a sua inclusão nas entradas dos dicionários. Os exemplos extraídos de corpora linguísticos, apesar de serem produtos linguísticos (textuais ou orais) reais, por vezes não resultam suficientemente transparentes para os utentes. Aliás, a tarefa de encontrar nos corpora os exemplos desejados nem sempre é fácil, designadamente no caso dos dicionários de valências. Nos últimos anos, têm sido observadas algumas deficiências nestas ferramentas, como por exemplo a ausência de corpora com um grande volume textual que estejam anotados semanticamente. Esta lacuna está na origem de uma série de projetos que, no campo da Geração d Linguagem Natural (GLN), visam desenvolver ferramentas de geração automática da linguagem, com o objetivo de lograr que estas ferramentas produzam exemplos muito próximos da linguagem autêntica. A presente dissertação visa contribuir para o desenvolvimento de duas ferramentas criadas do seio de dois projetos desta índole: os projetos MultiGenera e MultiComb. Atualmente, estas ferramentas trabalham com um total de dez substantivos, para os quais podem ser geradas frases nominais formadas pelo substantivo e uma ou mais estruturas argumentais. Para o aperfeiçoamento das ferramentas, não obstante, é preciso também trabalhar na inclusão de adjetivos que funcionem como atributos (isto é: não actanciais) destes substantivos, com o fim de lograr que as ferramentas produzam exemplos mais próximos da língua natural e que sejam apropriados para os fins dos recursos, que são concebidos como ferramentas úteis para gerar automaticamente exemplos para o dicionário multilingue de valências de substantivos PORTLEX. Por isso, a presente dissertação tem como objetivo realizar uma análise e classificação dos adjetivos mais frequentes que aparecem como atributos de três substantivos em espanhol: dolor (dor), olor (cheiro, odor) e discusión (discussão). A análise dos adjetivos será de tipo quantitativo (com atenção à sua frequência de uso) e qualitativo, com a realização de uma classificação que tem como eixo principal o conceito de função léxica (LF). Uma classificação em função das LFs tornará possível a criação de pacotes semânticos adjetivais que podem ser incorporados nas ferramentas de geração automática.
Tipo:	Dissertação de mestrado
Descrição:	Dissertação de mestrado europeia em Lexicography
URI:	https://hdl.handle.net/1822/68562
Acesso:	Acesso aberto
Aparece nas coleções:	BUM - Dissertações de Mestrado