搜索引擎的渐增式爬行和备份式更新模式
Incremental Crawling and Shadowing Update Strategy in Search Engines
-
摘要: 介绍了搜索引擎的总体结构,分析了搜索引擎中爬行器的爬行策略和网页库的更新模式。介绍了其中一种较为合理的爬行和更新模式及其实现技术,实现了渐增式地爬行高质量网页和提高网页库新鲜度的目的。Abstract: This paper analyses the general architecture of search engine, especially the design and (implementation) technology of the crawling and update strategy, the crawler can download "high quality" Web pages incrementally and maintain the "freshness" of the Web repository.