Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/25214

Registo completo
Campo DCValorIdioma
dc.contributor.authorFaria, Luís-
dc.contributor.authorAkbik, Alan-
dc.contributor.authorSierman, Barbara-
dc.contributor.authorRas, Marcel-
dc.contributor.authorFerreira, Miguel-
dc.contributor.authorRamalho, José Carlos-
dc.date.accessioned2013-09-20T09:43:06Z-
dc.date.available2013-09-20T09:43:06Z-
dc.date.issued2013-09-
dc.identifier.urihttps://hdl.handle.net/1822/25214-
dc.description.abstractThe ability to recognize when digital content is becoming endangered is essential for maintaining the long-term, continuous and authentic access to digital assets. To achieve this ability, knowledge about aspects of the world that might hinder the preservation of content is needed. However, the processes of gathering, managing and reasoning on knowledge can become manually infeasible when the volume and heterogeneity of content increases, multiplying the aspects to monitor. Automation of these processes is possible [11,21], but its usefulness is limited by the data it is able to gather. Up to now, automatic digital preservation processes have been restricted to knowledge expressed in a machine understandable language, ignoring a plethora of data expressed in natural language, such as the DPC Technology Watch Reports, which could greatly contribute to the completeness and freshness of data about aspects of the world related to digital preservation. This paper presents a real case scenario from the National Library of the Netherlands, where the monitoring of publishers and journals is needed. This knowledge is mostly represented in natural language on Web sites of the publishers and, therefore, is dificult to automatically monitor. In this paper, we demonstrate how we use information extraction technologies to end and extract machine readable information on publishers and journals for ingestion into automatic digital preservation watch tools. We show that the results of automatic semantic extraction are a good complement to existing knowledge bases on publishers [9, 20], finding newer and more complete data. We demonstrate the viability of the approach as an alternative or auxiliary method for automatically gathering information on preservation risks in digital content.por
dc.description.sponsorshipKEEP SOLUTIONSpor
dc.language.isoengpor
dc.publisherBiblioteca Nacional de Portugal (BNP)por
dc.relationinfo:eu-repo/grantAgreement/EC/FP7/270137por
dc.rightsopenAccesspor
dc.subjectDigital preservationpor
dc.subjectMonitoringpor
dc.subjectWatchpor
dc.subjectNatural languagepor
dc.subjectInformation extractionpor
dc.titleAutomatic preservation watch using information extraction on the Web: a case study on semantic extraction of natural language for digital preservationpor
dc.typeconferencePaper-
dc.peerreviewedyespor
dc.relation.publisherversionhttp://purl.pt/24107por
sdum.publicationstatuspublishedpor
oaire.citationConferenceDate03 - 05 set. 2013por
sdum.event.typeconferencepor
oaire.citationStartPage215por
oaire.citationEndPage224por
oaire.citationConferencePlaceLisboa, Portugalpor
oaire.citationTitleiPRES 2013 - 10th International Conference on Preservation of Digital Objectspor
sdum.conferencePublicationiPRES 2013 - 10th International Conference on Preservation of Digital Objectspor
Aparece nas coleções:KEEPS - Comunicações

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
ipres13-Scout+InformationExtraction.pdfPaper232,84 kBAdobe PDFVer/Abrir
Scout_iPRES13_scout-kraken.pdfPresentation4,88 MBAdobe PDFVer/Abrir

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID