AceBIT AceBIT
Home   Software   Download   Order   Support   Know-How   Company Englisch Version Hier klicken, um die deutsche Version anzuzeigen Cliquez ici pour charger la version française
Software
Download
Order
Support
Know-How
Website
Surveys
Company
Info: Design, publish and analyse surveys Test WinSurvey now 30 days for free!

Robots.txt

Have you also been wondering, why the error log of your web server constantly returns entries like

[error] [client 204.62.245.187] File does not exist: /usr/local/etc/httpd/htdocs/mysupersite/robots.txt?

When you submit your website to a spider engine, the spider engine "visits" your site to register it. Most spider engines thereby search automatically for the robots.txt file. If this file is not found, the above error occurs.

However, the robots.txt file is not compulsory. Instead of a file, you can use the "robots" meta tag. However, if you do not include a robots.txt file and submit your page to hundreds of spider engines (e.g. with Hello Engines!), you will receive also hundreds of error messages. Please note that your website is probably visited by several search engines every day. Therefore, the error.log file might soon become very large, as it is filled up with irrelevant error messages.

In the robots.txt file of your site, you have the option to define the pages that are to be excluded from the indexing. Please not that only one robots.txt is taken into account per server and that it must be located on the top level. For a UNIX system, it can for example be filed in

/usr/local/etc/httpd/htdocs/robots.txt

The syntax of the robots.txt is extremely simple and generally looks like this:

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /client data/

In the above case, two directories are excluded from the indexing. For each directory that is not to be indexed by the spider engine, you must add a separate "disallow" line.

Example: to block all robots from accessing and indexing your website, enter the following lines in the robots.txt file:

    User-agent: *
    Disallow: /

To allow all robots to access and index all pages of your website, enter the following lines in robots.txt:

    User-agent: *
    Disallow:

To prevent a specific robot from accessing your directories, enter the following:

    User-agent: Yahoo
    Disallow: /

To allow only one specific robot to index your directories (thus blocking all others) enter the following lines:

    User-agent: Yahoo
    Disallow:
    User-agent: *
    Disallow: /

Similarly, you can exclude specific pages from indexing:

    User-agent: *
    Disallow: /client data/passwords.html

 

News
All new developments at a glance!
Partner Program
Become our partner and share our success.
Newsletter
Get informed about most current updates. Subscribe to our newsletter.
References
Successful companies use our software.
Awards
Look at awards our products received.
Search
  Copyright © 1998-2004 by AceBIT GmbH - All rights reserved!