Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/32460

Registo completo
Campo DCValorIdioma
dc.contributor.authorGlez-Peña, Danielpor
dc.contributor.authorLourenço, Análiapor
dc.contributor.authorLópez-Fernández, Hugopor
dc.contributor.authorReboiro-Jato, Miguelpor
dc.contributor.authorFdez-Riverola, Florentinopor
dc.date.accessioned2015-01-07T13:42:37Z-
dc.date.available2015-01-07T13:42:37Z-
dc.date.issued2014-
dc.identifier.issn1477-4054por
dc.identifier.urihttps://hdl.handle.net/1822/32460-
dc.description.abstractWeb services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.por
dc.description.sponsorshipThis work was partially funded by (i) the [TIN2009-14057-C03-02] project from the Spanish Ministry of Science and Innovation, the Plan E from the Spanish Government and the European Union from the European Regional Development Fund (ERDF), (ii) the Portugal-Spain cooperation action sponsored by the Foundation of Portuguese Universities [E 48/11] and the Spanish Ministry of Science and Innovation [AIB2010PT-00353] and (iii) the Agrupamento INBIOMED [2012/273] from the DXPCTSUG (Direccion Xeral de Promocion Cientifica e Tecnoloxica do Sistema Universitario de Galicia) from the Galician Government and the European Union from the ERDF unha maneira de facer Europa. H. L. F. was supported by a pre-doctoral fellowship from the University of Vigo.por
dc.language.isoengpor
dc.publisherOxford University Presspor
dc.rightsopenAccesspor
dc.subjectWeb scrapingpor
dc.subjectData integrationpor
dc.subjectInteroperabilitypor
dc.subjectDatabase interfacespor
dc.titleWeb scraping technologies in an API worldpor
dc.typearticle-
dc.peerreviewedyespor
dc.commentsCEB14738por
sdum.publicationstatuspublishedpor
oaire.citationStartPage788por
oaire.citationEndPage797por
oaire.citationIssue5por
oaire.citationConferencePlaceUnited Kingdom-
oaire.citationTitleBriefings in bioinformaticspor
oaire.citationVolume15por
dc.date.updated2015-01-05T20:51:40Z-
dc.identifier.eissn1467-5463-
dc.identifier.doi10.1093/bib/bbt026por
dc.identifier.pmid23632294por
dc.subject.wosScience & Technologypor
sdum.journalBriefings in bioinformaticspor
Aparece nas coleções:CEB - Publicações em Revistas/Séries Internacionais / Publications in International Journals/Series

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
document_14738_1.pdf548,87 kBAdobe PDFVer/Abrir

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID