Hybrid Algorithm for Enhancing Focused Web Crawling Using Block Segmentation - Niti Saxena - 图书 -  - 9798590387021 - 2021年1月28日
如封面与标题不符,以标题为准

Hybrid Algorithm for Enhancing Focused Web Crawling Using Block Segmentation

价格
元 167
不含税

远程仓调货

预计送达时间 年6月9日 - 年6月25日
添加至iMusic心愿单

Search Engine, we are usually referring to the actual search that we are performing through the databases of HTML documents . It is software that helps in locating the information stored on WWW. The purpose of partitioning the web page into blocks is that first we partition the pages into blocks, then only those URLs are extracted which belongs to only the relevant blocks and do not extract those URLs which do not belong to relevant block. A problem faced by focused crawlers is that they measure the relevancy of a page and calculates the URL score of the whole page and a Web page usually contains both relevant as well as irrelevant topics. Page segmentation transforms multi-topic web page into many single topic context blocks and hence improves its performance. These multiple-topic content blocks such as navigation panels, copyright and privacy notices, unnecessary images, and advertisements distract a user from the actual content and the performance reduces. In this thesis, we present a method to divide the web pages into content blocks. This method uses an algorithm to partition a web page into content blocks with a hierarchical structure and partition the pages based on their pre-defined structure, i.e. the HTML tags. In our proposed method of partitioning the web pages into blocks on the basis of headings gives an advantage over conventional block partitioning is that we divide the blocks which include a complete topic. The heading, content, images, links, tables, sub tables of a particular topic is included in one complete block.

介质类型 图书     Paperback Book   (平装胶订图书)
已发行 2021年1月28日
ISBN13 9798590387021
页数 48
商品尺寸 152 × 229 × 3 mm   ·   77 g
语言 英语