Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/16442

Registo completo
Campo DCValorIdioma
dc.contributor.authorAlmeida, J. J.-
dc.contributor.authorSimões, Alberto-
dc.date.accessioned2012-01-17T17:18:51Z-
dc.date.available2012-01-17T17:18:51Z-
dc.date.issued2010-05-
dc.identifier.urihttps://hdl.handle.net/1822/16442-
dc.description.abstractIn our days, the notion, the importance and the significance of parallel corpora is so big that needs no special introduction. Unfortunately, public available parallel corpora is somewhat limited in range. There are big corpora about politics or legislation, about medicine and other specific areas, but we miss corpora for other different areas. Currently there is a huge investment on using the Web as a corpus. This article uncovers GWB, a tool that aims automatic construction of parallel corpora from the web. We defend that it is possible to build high quality terminological corpora in an automatic fashion, just by specifying a sensible Internet domain and using an appropriate set of seed keywords. GWB is a web-spider that works in conjunction with a set of other Open-Source tools, defining a pipeline that includes the documents retrieval from the web, alignment at sentence level and its quality analysis, bilingual dictionaries and terminology extraction and construction of off-line dictionaries.por
dc.language.isoengpor
dc.publisherEuropean Language Resources Association (ELRA)por
dc.rightsopenAccesspor
dc.subjectParallel corporapor
dc.subjectBlingual terminologypor
dc.subjectWeb as corporapor
dc.titleAutomatic parallel corpora and bilingual terminology extraction from parallel WebSitespor
dc.typeconferencePaper-
dc.peerreviewedyespor
sdum.publicationstatuspublishedpor
oaire.citationStartPage50por
oaire.citationEndPage55por
oaire.citationTitle3rd Workshop on Building and Using Comparable Corporapor
dc.subject.wosSocial Sciencespor
sdum.conferencePublication3rd Workshop on Building and Using Comparable Corporapor
sdum.bookTitleLREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATIONpor
Aparece nas coleções:DI/CCTC - Artigos (papers)

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
bucc2010.pdfDocumento principal252,03 kBAdobe PDFVer/Abrir

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID