Please use this identifier to cite or link to this item: http://hdl.handle.net/1822/44787

TitleDevelopment of a machine learning framework for biomedical text mining
Author(s)Rodrigues, Rúben
Costa, Hugo Samuel Oliveira
Rocha, Miguel
KeywordsBiomedical text mining
Named entity recognition
Machine learning
Issue dateJun-2016
PublisherSpringer International Publishing
JournalAdvances in Intelligent Systems and Computing
CitationRodrigues, Rúben; Hugo Costa; Rocha, Miguel, Development of a machine learning framework for biomedical text mining. In Mohd Saberi Mohamad, Miguel P. Rocha, Florentino Fdez-Riverola, Francisco J. Domínguez Mayo, Juan F. De Paz, 10th International Conference on Practical Applications of Computational Biology & Bioinformatics, Vol. Advances in Intelligent Systems and Computing 477, Switzerland: Springer International Publishing, 2016. ISBN: 978-3-319-40125-6, 41-49
Abstract(s)Biomedical text mining (BTM) aims to create methods for searching and structuring knowledge extracted from biomedical literature. Named entity recognition (NER), a BTM task, seeks to identify mentions to biological entities in texts. Dictionaries, regular expressions, natural language processing and machine learning (ML) algorithms are used in this task. Over the last years, @Note2, an open-source software framework, which includes user-friendly interfaces for important tasks in BTM, has been developed, but it did not include ML-based methods. In this work, the development of a framework, BioTML, including a number of ML-based approaches for NER is proposed, to fill the gap between @Note2 and state-of-the-art ML approaches. BioTML was integrated in @Note2 as a novel plug-in, where Hidden Markov Models, Conditional Random Fields and Support Vector Machines were implemented to address NER tasks, working with a set of over 60 feature types used to train ML models. The implementation was supported in open-source software, such as MALLET, LibSVM, ClearNLP or OpenNLP. Several manually annotated corpora were used in the validation of BioTML. The results are promising, while there is room for improvement.
TypeConference paper
URIhttp://hdl.handle.net/1822/44787
ISBN978-3-319-40125-6
e-ISBN978-3-319-40126-3
DOI10.1007/978-3-319-40126-3_5
ISSN2194-5357
Publisher versionhttp://link.springer.com/book/10.1007/978-3-319-40126-3
Peer-Reviewedyes
AccessRestricted access (UMinho)
Appears in Collections:CEB - Livros e Capítulos de Livros / Books and Book Chapters

Files in This Item:
File Description SizeFormat 
document_38995_1.pdf
  Restricted access
427,48 kBAdobe PDFView/Open    Request a copy!

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID