It's a very simple text file you can find all about at the robotstxt.org website.
Why should you use it ? Here are some good reasons for you to consider.
You can keep search engines away from content you wish to keep out of sight, but remember your robots file is also subject to attention of hackers seeking sensitive objectives you might inadvertently líst: keeping out the robots while inviting the hackers � keep this in mind.
According to Keith Hogan from Ask:
i) Less than 35% of websites have a robots.txt file
ii) The majority of robots.txt files are copied from others found online
iii) On many occasions robots.txt files are provided by your web hostíng service
It looks like the majority of webmasters aren't familiar with this file. This is going to play a major role as the size of the web continues to grow: Spidering is a costly effort that search engines tend to optimize. Those web sites demonstrating optimal command (which in turn determines efficiency) will be rewarded.
- Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.
Carefully review the robots exclusion protocol available at robotstxt.org. If you must exclude numerous areas of your website, build your file in a step by step manner and monitor spider behaviour with a log analyser tool.
Recently I was called on to examine a blog burdened by a very unusual and extremely heavy spidering activity: the log file I examined reported an excess of 8 Gbyte of invisibile (spider) traffíc over a 1 month period. Given the reduced amount of daily visitors (less than 200) and the reduced size of the blog (less than 100 posts), there was something wrong in the architecture.
It took just a few minutes to identify the problem: There was no robots.txt file.
At each request for a robots.txt there was a redirect to the home page of the blog triggering a complete download of the blog home page. Each download of the home page was approximately 250 K. There were thöusands of unnecessary hits on the home page. This was causing a spidering frenzy that ceased when an empty robots.txt file was created and uploaded to the server. Traffíc is now down from 8 Gbyte to 500 Mbyte.