The Best Way To Track |
The Best Way To Track
The best way to identify spider visits is by finding out which visitors asked for the file robots.txt from your site. Only spiders make such a request, as this file is an indication to them to avoid covering the page in question. So the first thing a crawler would do is to check for this file. If you see the access log and analyze it using some convenient software, you would be able to spot all the visits that were initiated with this request. Then one can spot the host name and relate that to major search engines. Host names are related to the search engine company’s name (it is the name of the site that hosts the spider). Another name that is used to identify such visits is the agent or browser names used by respective search engines. Get a list of host names and agent names from available resources (these names tend to change often) and also develop your own intuitive list by searching your access logs for all occurrences of known engine, host or agent names.
Concentrate only on the top engines; though you may find several other smaller and less known search engines visiting your site.
Pay attention to not only the total number of visits but to the activity pattern for each of the recent visits to actually judge how many pages they covered.
This is a very good way of ensuring if submissions have worked or if other inducements such as links from other sites have worked or not. This also helps you to distinctly evaluate the effectiveness of submission, indexing and page ranking characteristics of your site.
Some examples of hostnames and agent names are as below:
• AltaVista: hostname may have altavista.com within its name; agent is often called Scooter
• Excite host name may have atex or excite.com and agent name is Architextspider.
• PositionTech agent and host names have PositionTech.com and Slurp is often used as the agent name.
• Lycos uses lycos.com within its host name and Lycos Spider is often part of the agent name.
One can use specific search strings in most search engines to find if your URL is included in their index and also to see how many pages are thus indexed. These search strings have been identified and compiled by some useful resources on SEO.
For searching the pages from your URL in Google for example, insert the following search string in Google search:
allinurl:yourcompanyname.com/webmasters/meta.html (this depends on the index pages of your site). In the Yahoo directory use the command u:yourcompanyname.com to find the listings for this URL. There are similar, but specific search strings applicable to each search engine.
Again checking the search engine for your URL is a good way to check what that search engine has indexed. Thus through spider spotting and URL checking you have evaluated and confirmed the effectiveness of your actions on submission and indexing activities.
The best way to identify spider visits is by finding out which visitors asked for the file robots.txt from your site. Only spiders make such a request, as this file is an indication to them to avoid covering the page in question. So the first thing a crawler would do is to check for this file. If you see the access log and analyze it using some convenient software, you would be able to spot all the visits that were initiated with this request. Then one can spot the host name and relate that to major search engines. Host names are related to the search engine company’s name (it is the name of the site that hosts the spider). Another name that is used to identify such visits is the agent or browser names used by respective search engines. Get a list of host names and agent names from available resources (these names tend to change often) and also develop your own intuitive list by searching your access logs for all occurrences of known engine, host or agent names.
Concentrate only on the top engines; though you may find several other smaller and less known search engines visiting your site.
Pay attention to not only the total number of visits but to the activity pattern for each of the recent visits to actually judge how many pages they covered.
This is a very good way of ensuring if submissions have worked or if other inducements such as links from other sites have worked or not. This also helps you to distinctly evaluate the effectiveness of submission, indexing and page ranking characteristics of your site.
Some examples of hostnames and agent names are as below:
• AltaVista: hostname may have altavista.com within its name; agent is often called Scooter
• Excite host name may have atex or excite.com and agent name is Architextspider.
• PositionTech agent and host names have PositionTech.com and Slurp is often used as the agent name.
• Lycos uses lycos.com within its host name and Lycos Spider is often part of the agent name.
One can use specific search strings in most search engines to find if your URL is included in their index and also to see how many pages are thus indexed. These search strings have been identified and compiled by some useful resources on SEO.
For searching the pages from your URL in Google for example, insert the following search string in Google search:
allinurl:yourcompanyname.com/webmasters/meta.html (this depends on the index pages of your site). In the Yahoo directory use the command u:yourcompanyname.com to find the listings for this URL. There are similar, but specific search strings applicable to each search engine.
Again checking the search engine for your URL is a good way to check what that search engine has indexed. Thus through spider spotting and URL checking you have evaluated and confirmed the effectiveness of your actions on submission and indexing activities.