MJ12Bot

From Majestic 12 Distributed Search Engine

Jump to: navigation, search

Contents

MJ12bot - Majestic-12's Distributed Crawler Robot

You've most likely reached this page by clicking a link left by MJ12bot in your log files. Below you can see some of the most Frequently Asked Questions regarding MJ12bot.


What is MJ12bot doing on my site(s)?

We spider the Web for the purpose of building a distributed search engine with fast and efficient downloadable distributed crawler that will enable people with broadband connections to help contribute to, what we hope, will become the biggest search engine in the world.

What happens with crawled data?

Crawled data is added to the search engine index. This is a work in progress, but an Alpha version of the search engine is available here.

How can I block MJ12bot?

MJ12bot adheres to robots.txt standard. If you want to prevent your website from being crawled by our robot, add the following text to your robots.txt:

User-Agent: MJ12bot
Disallow:   /

Please do not waste your time trying to block bot via IP in .htaccess. We do not use consequitive IP blocks, so your efforts will be in vain. Also, please make sure the bot can actually retrieve robots.txt itself. If it can't, it will assume (this is the industry practice) that its okay to crawl your site.

If you have reason to believe that MJ12bot did NOT obey your robots.txt commands, please let us know via email: bot@majestic12.co.uk. Please provide the URL to your website and log entries showing bot trying to retrieve pages that it was not supposed to.


How can I slow down MJ12bot?

You can easily slow down the bot by adding the following to your robots.txt file:

User-Agent: MJ12bot
Crawl-Delay:   5

Crawl-Delay should be an integer number and it signifies number of seconds of wait between requests. MJ12bot will make a delay of up to 30 seconds between requests to your site. Note that however that while it is unlikely, it is still possible your site may have been crawled from multiple MJ12bots at the same time. Making Crawl-Delay high should minimise any impact on your site.

If you have not been satisfied with the information above then feel free to contact us: bot@majestic12.co.uk

Personal tools