Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/72111

TítuloCatching web crawlers in the act
Autor(es)Lourenço, Anália Maria Garcia
Belo, Orlando
Palavras-chaveData webhousing
Traverse patterns
Web crawler
Web usage mining
Data2006
EditoraAssociation for Computing Machinery (ACM)
CitaçãoAnália G. Lourenço and Orlando O. Belo. 2006. Catching web crawlers in the act. In Proceedings of the 6th international conference on Web engineering (ICWE '06). Association for Computing Machinery, New York, NY, USA, 265–272. DOI: https://doi.org/10.1145/1145581.1145634
Resumo(s)This paper recommends a new approach to the detection and containment of Web crawler traverses based on clickstream data mining. Timely detection prevents crawler abusive consumption of Web server resources and eventual site contents privacy or copyrights violation. Clickstream data differentiation ensures focused usage analysis, valuable both for regular users and crawler profiling. Our platform, named ClickTips, sustains a site-specific, updatable detection model that tags Web crawler traverses based on incremental Web session inspection and a decision model that assesses eventual containment. The goal is to deliver a model flexible enough to keep up with crawling continuous evolving and that is capable of detecting crawler presence as soon as possible. We use a real-world Web site case study as a support for process description, as well as, to evaluate the accuracy of the obtained classification models and their ability for discovering previously unknown Web crawlers.
TipoArtigo em ata de conferência
URIhttps://hdl.handle.net/1822/72111
ISBN1595933522
DOI10.1145/1145581.1145634
Versão da editorahttps://doi.org/10.1145/1145581.1145634
Arbitragem científicayes
AcessoAcesso restrito UMinho
Aparece nas coleções:CAlg - Artigos em livros de atas/Papers in proceedings

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
2006-CI-ICWE-Lourenco&Belo-CRP.pdf
Acesso restrito!
781 kBAdobe PDFVer/Abrir

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID