Slurp – The PositionTech Robot

Slurp – The PositionTech Robot
Slurp collects documents from the web to build a searchable index for search services using the PositionTech search engine, including Microsoft and HotBot. Some of the characteristics of Slurp are given below:
Frequency of accesses Slurp accesses a website once every five seconds on average. Since network delays are involved it is possible over short periods the rate will appear to be slightly higher, but the average frequency generally remains below once per minute.
robots.txt
Slurp obeys the Robot Exclusion Standard. Specifically, Slurp adheres to the 1994 Robots Exclusion Standard (RES). Where the 1996 proposed standard disambiguates the 1994 standard, the proposed standard is followed.
Slurp will obey the first record in the robots.txt file with a User-Agent containing "Slurp". If there is no such record, it will obey the first entry with a User-Agent of "*".
This is discussed in detail later in this book.
NOINDEX meta-tag
Slurp obeys the NOINDEX meta-tag. If you place <META NAME="robots" CONTENT="noindex">
in the head of your web document, Slurp will retrieve the document, but it will not index the document or place it in the search engine's database.
Repeat downloads
In general, Slurp would only download one copy of each file from your site during a given crawl. Occasionally the crawler is stopped and restarted, and it re-crawls pages it has recently retrieved. These re-crawls happen infrequently, and should not be any cause for alarm.
Searching the results
Slurp crawls from websites to the PositionTech search engines immediately. The documents are indexed and entered into the search database in quick time.
Following links
Slurp follows HREF links. It does not follow SRC links. This means that Slurp does not retrieve or index individual frames referred to by SRC links.
Dynamic links
Slurp has the ability to crawl dynamic links or dynamically generated documents. It will not, however, crawl them by default. There are a number of good reasons for this. A couple of reasons are that dynamically generated documents can make up infinite URL spaces, and that dynamically generated links and documents can be different for every retrieval so there is no use in indexing them.
Share Subscribe
 

Copyright © 2011 SEO Plan | Design by Kenga Ads-template

Related Posts Plugin for WordPress, Blogger...