Processamento linguístico de narrativas produzidas por crianças lusodescendentes e proposta de interface de pesquisa

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/92229

Título:	Processamento linguístico de narrativas produzidas por crianças lusodescendentes e proposta de interface de pesquisa
Outro(s) título(s):	Linguistic processing of narratives produced by children of portuguese descent and proposal of a search interface Linguistische verarbeitung von erzählungen durch kinder portugiesischer abstammung und vorschlag für eine forschungsschnittstelle
Autor(es):	Antunes, João Vieira
Orientador(es):	Dias, Idalete Flores, Cristina Rinke-Scholl, Esther
Palavras-chave:	Corpus de narrativas escritas Crianças bilingues Interface de pesquisa Português língua de herança Processamento de linguagem natural Bilingual children Corpus of written narratives Natural language processing Search interface Portuguese as heritage language Forschungsschnittstelle Korpus geschriebener erzählungen Linguistische datenverarbeitung Portugiesisch als herkunftssprache Zweisprachige Kinder
Data:	15-Abr-2024
Resumo(s):	Este projeto conjugará duas importantes áreas de estudo da linguística e das humanidades digitais, nomeadamente o bilinguismo e a análise e tratamento de corpora. Para tal, foram transcritas e analisadas 40 narrativas retiradas de um projeto de investigação, coordenado pela Professora Doutora Cristina Flores, investigadora no Centro de Estudos Humanísticos da Universidade do Minho. O estudo centrou-se em crianças lusodescendentes, que vivem na Suíça (cantão alemão), tendo estas sido submetidas a dois instrumentos de recolha de dados em português europeu (PE) e em alemão padrão (AP). Numa das fases do projeto, os participantes ouviram uma história e tiveram de a recontar nas duas línguas, produzindo as narrativas que foram analisadas nesta dissertação. Com efeito, o presente projeto de dissertação tem como objetivo contribuir para a criação de um corpus eletrónico de narrativas produzidas em PE e AP por falantes de herança e o seu processamento. Através de técnicas de processamento de linguagem natural, o corpus foi lematizado e etiquetado ao nível morfossintático [part-of-speech tagging] com recurso ao Sketch Engine – uma ferramenta de análise e gestão dos corpora. Estas duas camadas de informação linguística permitiram identificar e analisar padrões linguísticos específicos dos informantes em questão e ainda de preservar as próprias narrativas e disponibilizar um recurso para a comunidade científica. Na segunda parte do projeto, é apresentada uma prova-conceito de um protótipo de interface de pesquisa, que tem como objetivo o armazenamento destes dados linguísticos na sua versão anotada e não anotada e a disponibilização deste recurso para toda a comunidade linguística, contribuindo, assim, para a sustentabilidade e preservação deste tipo de recursos. This project will combine two important areas of study within linguistics and digital humanities, specifically heritage bilingualism and the analysis and electronic processing of corpora. To this end, 40 narratives from a research project coordinated by Professor Cristina Flores, a researcher at the Centre for Humanistic Studies at the University of Minho, were taken, transcribed, and analysed. This research focused on children of Portuguese descent living in Switzerland (German canton), who were given two data collection instruments in European Portuguese (EP) and Standard German (AP). In one of the stages of the project, the participants had to listen to a story and had to retell it in both languages, producing the narratives analysed in this dissertation. That said, this thesis aims to contribute to the creation and processing of an electronic corpus of narratives produced in EP and AP by heritage speakers. Through natural language processing techniques, the corpus was lemmatized and tagged at the morphosyntactic level [POS-Tagging] using Sketch Engine – a tool for analysing and managing corpora. These two layers of linguistic data allowed to identify and analyse the specific linguistic patterns of the informants in question, to preserve the narratives per se, and to make available the resource to the scientific community. The second part of this dissertation presents a proof of concept of a search interface prototype, which intends to store this linguistic data in its annotated and unannotated versions and make the resource available to the entire linguistic community, thus contributing to the sustainability and preservation of this type of resource. In diesem Projekt werden zwei wichtige Forschungsbereiche der Linguistik und der digitalen Geisteswissenschaften kombiniert, nämlich die Zweisprachigkeit von Herkunftssprechern und die Analyse und Verarbeitung von Korpora. Zu diesem Zweck wurden 40 Erzählungen aus einem Forschungsprojekt, das von Frau Professor Cristina Flores, einer Forscherin am Zentrum für geisteswissenschaftliche Studien der Universität von Minho, koordiniert wurde, gesammelt, transkribiert und analysiert. Im Mittelpunkt dieses Forschungsprojekts standen Kinder portugiesischer Abstammung, die in der Schweiz (deutscher Kanton) lebten und denen zwei Instrumente zur Datenerhebung in europäischem Portugiesisch (EP) und Hochdeutsch (AP) zur Verfügung gestellt wurden. In einer der Projektphasen mussten die Teilnehmer eine Geschichte anhören und in beiden Sprachen nacherzählen, wodurch die in dieser Arbeit analysierten Erzählungen entstanden. Ziel dieser Arbeit ist es, einen Beitrag zur Erstellung und Bearbeitung eines elektronischen Korpus von Erzählungen zu leisten, die von Herkunftssprechern in EP und AP produziert wurden. Mit Hilfe von Techniken zur Verarbeitung natürlicher Sprache wurde der Korpus lemmatisiert und auf morphosyntaktischer Ebene [POS-Tagging] mit Hilfe von Sketch Engine – einem Tool zur Analyse und Verwaltung von Korpora – getaggt. Diese beiden Ebenen linguistischer Daten ermöglichten es, die spezifischen sprachlichen Muster der betreffenden Informanten zu identifizieren und zu analysieren, die Erzählungen als solche zu erhalten und die Ressource der wissenschaftlichen Gemeinschaft zur Verfügung zu stellen. Der zweite Teil dieser Dissertation stellt einen proof of concept eines Prototyps für eine Suchschnittstelle vor, der diese linguistischen Daten in ihren annotierten und nicht-annotierten Versionen speichern und der gesamten linguistischen Gemeinschaft zur Verfügung stellen soll, um so zur Nachhaltigkeit und Erhaltung dieser Art von Ressourcen beizutragen.
Tipo:	Dissertação de mestrado
Descrição:	Dissertação de mestrado em Estudos Luso-Alemães
URI:	https://hdl.handle.net/1822/92229
Acesso:	Acesso aberto
Aparece nas coleções:	BUM - Dissertações de Mestrado ELACH - Dissertações de Mestrado