Avoiding question-answering congestion on health services using chatbots

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/80111

Título:	Avoiding question-answering congestion on health services using chatbots
Autor(es):	Pereira, Henrique Manuel Palmeira
Orientador(es):	Macedo, Joaquim Craveiro, Olga
Palavras-chave:	Chatbot COVID-19 Processamento da informação Processamento de linguagem natural Information processing Natural language processing
Data:	18-Mai-2022
Resumo(s):	The proliferation of social networks presents a significant amount of fake news and fake information every day and every second. The COVID-19 pandemic confirms this situation. The general ignorance of this disease causes the spreading of misleading information, harming people's lives and governments' actions to contain it. To fight this infodemic, the populations resorted to the health services' phone lines, congesting them with questions, most of them repeated among different individuals and locations. A chatbot for COVID-19- related questions would redirect this workload from the health services, mitigating such congestion. This chatbot should work for both the English and Portuguese languages. This work provides a background overview about web crawlers, information processing and chatbot development, which are the three components of the application. A systematic literature review was done to provide an analysis of the existing literature on the mentioned thematics. The application presented in this work consists of three main modules: a web crawler, using the ACHE crawler application, which downloads the web pages from the trustworthy sources; a text processor, that parses the web pages and indexes them according to their language to the respective ElasticSearch index; and a chatbot component, composed by a fine-tuned BERT model with the SQuAD 2.0 dataset and a web interface that queries the ElasticSearch indexes for the most relevant pages and extracts the answers to the given questions by the users. To comply with the English and Portuguese requirement, two sets of reliable sources were defined (one for each language) and a translated version of SQuAD 1.1 dataset was used to train the Portuguese BERT model. The chatbot queries the correct model using the web browser's defined language. Our system was evaluated using a set of COVID-19 QA pairs extracted from the United Nations website, and the obtained results are described in this work. These were far from the desirable outcomes, so some improvements were applied to the crawler and to the ElasticSearch indexes. However the results were still not satisfactory, requiring a set of future modifications that are presented in this work. Com a proliferação das redes sociais, um número significativo de fake news é disponibilizado às pessoas todos os dias, a cada segundo. Isto foi confirmado durante a pandemia da COVID-19, onde um desconhecimento geral da doença causou a difusão de informação enganosa, colocando em risco a vida das pessoas e as ações governamentais que visavam o controlo da doença. Para combater esta infodemia, as populações recorreram às linhas telefónicas dos serviços de saúde nacionais, congestionando-as com questões muitas vezes repetidas. Com o intuito de mitigar este con-gestionamento, um chatbot para a COVID-19 ajudaria a redirecionar esta carga de trabalho dos serviços de saúde para a aplicação. Este chatbot deve suportar as linguas Portuguesa e Inglesa. Este trabalho apresenta uma visão geral acerca de web crawlers, de processamento de informação e de desenvolvimento de chatbots. Uma revisão sistemática da literatura foi conduzida com o intuito de apresentar uma análise da literatura existente. A aplicação apresentada neste trabalho consiste em três componentes principais: um web crawler, usando a aplicação ACHE, que descarrega as páginas web das fontes confiáveis; um componente de processamento de texto, que processa as páginas e as indexa de acordo com a sua língua no respetivo índice de ElasticSearch; e um chatbot, composto por um modelo BERT treinado e refinado com o dataset SQuAD 2.0 e uma interface web, que pesquisa no ElasticSearch as páginas mais relevantes e extrai dai as respostas para as perguntas dos utilizadores. Para satisfazer o requisito das duas línguas, dois conjuntos de páginas confiáveis foram definidos (um para cada lingua), e uma versão traduzida do SQuAD 1.1 foi utilizada para treinar o modelo BERT em Português. O chatbot questiona o modelo correto consoante a língua configurada no browser utilizado. O sistema foi avaliado usando um conjunto real de perguntas e respostas sobre COVID-19, sendo apresentados neste trabalho os resultados obtidos. Estes ficaram longe do desejado, pelo que algumas melhorias foram aplicadas ao sistema. Porém, os resultados permaneceram ainda assim insatisfatórios, necessitando de um conjunto de Muras alterações que são apresentadas neste trabalho
Tipo:	Dissertação de mestrado
Descrição:	Dissertação de mestrado integrado em Engenharia Informática
URI:	https://hdl.handle.net/1822/80111
Acesso:	Acesso aberto
Aparece nas coleções:	BUM - Dissertações de Mestrado DI - Dissertações de Mestrado