Please use this identifier to cite or link to this item: http://hdl.handle.net/1822/53766

TitleDevelopment of an information retrieval tool for biomedical patents
Author(s)Alves, T.
Rodrigues, Rúben
Hugo Costa
Rocha, Miguel
KeywordsBiomedical Text Mining
Information Retrieval
Information Extraction
Patents
PDF to text conversion
Issue dateJun-2018
PublisherElsevier
JournalComputer Methods and Programs in Biomedicine
CitationAlves, T.; Rodrigues, Rúben; Hugo Costa; Rocha, Miguel, Development of an information retrieval tool for biomedical patents. Computer Methods and Programs in Biomedicine, 159, 125-134, 2018
Abstract(s)Background and objective. The volume of biomedical literature has been increasing in the last years. Patent documents have also followed this trend, being important sources of biomedical knowledge, technical details and curated data, which are put together along the granting process. The field of Biomedical text mining (BioTM) has been creating solutions for the problems posed by the unstructured nature of natural language, which makes the search of information a challenging task. Several BioTM techniques can be applied to patents. From those, Information Retrieval (IR) includes processes where relevant data are obtained from collections of documents. In this work, the main goal was to build a patent pipeline addressing IR tasks over patent repositories to make these documents amenable to BioTM tasks. Methods. The pipeline was developed within @Note2, an open-source computational framework for BioTM, adding a number of modules to the core libraries, including patent metadata and full text retrieval, PDF to text conversion and optical character recognition. Also, user interfaces were developed for the main operations materialized in a new @Note2 plug-in. Results. The integration of these tools in @Note2 opens opportunities to run BioTM tools over patent texts, including tasks from Information Extraction, such as Named Entity Recognition or Relation Extraction. We demonstrated the pipelines main functions with a case study, using an available benchmark dataset from BioCreative challenges. Also, we show the use of the plug-in with a user query related to the production of vanillin. Conclusions. This work makes available all the relevant content from patents to the scientific community, decreasing drastically the time required for this task, and provides graphical interfaces to ease the use of these tools.
TypeArticle
DescriptionSupplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.cmpb.2018.03.012 .
URIhttp://hdl.handle.net/1822/53766
DOI10.1016/j.cmpb.2018.03.012
ISSN0169-2607
Publisher versionhttp://www.cmpbjournal.com/
Peer-Reviewedyes
AccessOpen access
Appears in Collections:CEB - Publicações em Revistas/Séries Internacionais / Publications in International Journals/Series

Files in This Item:
File Description SizeFormat 
document_47485_1.pdf3,95 MBAdobe PDFView/Open

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID