Here are the old news that were put here how long and painful process of development was: they don't show sleepless nights, broken expectations, false hopes, lack of beer and many other terrible things that happened.
17/04/07 MJ12node v1.4.8
A few small (but very important!) changes in this release that can be downloaded from here.
15/02/07 MJ12node v1.4.7
A small (but important!) bug fix release that can be downloaded from here.
05/02/07 MJ12node v1.4.6
A bug fix release that can be downloaded from here.
21/01/07 MJ12node v1.4.5
Big update with comprehensive support for smart recrawls of priority buckets that will keep search engine up to date, plus bug fixes, please upgrade ASAP!
30/12/06 Almost 7 bln urls added taking database to well over 30 bln!
You can see breakdown of urls by TLD here.
27/12/06 MJ12node v1.4.3
Couple more fixes to some rare but possible HTML parsing errors, please upgrade ASAP!
23/12/06 MJ12node v1.4.2
A fix to a few rare but happening during archiving errors are in this release, also updated HTMLparser to v3.0.1 with changes fixing index out of range errors in cases when HTML tag was cut midway at the end of data.
22/12/06 HTMLparser v3.0.0 + MJ12node v1.4.1
Yet another version of .NET C# HTML parser has been released with more than double improvement in performance, .NET 2.0 build, removal of dependency on unsafe code and some old incorrect parsing bugs fixed. To make it faster we used a rather novel (in parsing HTML) approach with context-sensitive heuristical prediction engine to minimise number of new strings created, this feature alone is responsible for 20% performance improvement, and complete tag parsing code rewrite add more. Heuristics engine is trained on typical HTML tagset, but you can override that or switch it off completely. If you can beat the speed of parsing with better algorithms then please let me know!
MJ12node builds updated with new fast parser, this should reduce Stage 2 archiving time very nicely, also 64-bit Vista/Longhorn now supported, get new node here.
14/12/06 MJ12node v1.4.0
Mammoth* upgrade to MJ12node is released, it is highly
recommended to upgrade because new version includes a over 100 of fixes and features too numerious to list here, but inquiring minds can see the details in this forum post. This release, yet again, marks development shift back to the search engine.
* Much like poor mammoths' my will to live almost became extinct during development of this version...
04/12/06 HTMLparser v2.0.0
New version of .NET C# HTML parser has been released with about 100% improvement in performance, fixed bugs and support for text encodings to parse non-English pages correctly.
20/11/06 More urls added taking database to almost 25 bln!
See for yourself here.
09/09/06 3.8 bln urls added taking database to over 20 bln!
Even more billions of discovered urls were process with 3.8 billion of them selected for crawl! You can see breakdown of urls in database by TLD (top-level-domain) here. This load took database of known database to over 20 bln!
30/07/06 Search engine updated to v0.6.4
Continued peformance improvements as well as bug fixes and new index of Wikipedia's 2.5 mln articles are featured in this release.
23/07/06 Search engine updated to v0.6.3
Significant performance improvements and rebuild of all subindices was the main goal of this update.
14/07/06 Search engine updated to v0.6.2
Completely rewritten code to support multiple indices efficiently in a multi-server environment. Functions like View Text, Explain Time, Explain Rank, Word Search are all now working correctly with multiple indices (something that was broken or disabled for 1 bln page release). Development focus is moved to weekly search engine updates.
16/06/06 MJ12node v1.3.0
Major upgrade to MJ12node is released, it is highly
recommended to upgrade because new version includes a lot of fixes and features too numerious to list here, but inquiring minds can see the details in this forum post. This release marks development shift back to the search engine itself.
25/05/06 New Statesman New Media Award nomination!
The project was nominated for New Statesman New Media Award in the Contribution to Civic Society category, so wish us good luck, the winners will be announced in July 2006!
20/04/06 Biggest URL load to date: 2.8 billion urls added!
Huge analysis task that processed many billions of discovered urls and selected 2.8 billion of them for recrawl! You can see breakdown of urls in database by TLD (top-level-domain) here. This load ends recrawl and initiates regional focus.
07/04/06 MJ12node v1.2.7
A fix to broken restart function in Windows release of MJ12node is released, it is highly
recommended to upgrade since automatic restart is used internally in a number of cases. Users of Linux/FreeBSD builds are not affected.
06/04/06 MJ12node v1.2.6
New version of MJ12node is released with a number of fixes that reduce CPU and memory usage, it is highly
recommended to upgrade.
23/03/06 The Guardian published an article about the project!
Michael Pollitt from the Guardian (excellent national UK newspaper) has written an article about us, check it out here!
17/03/06 1 billion pages now searchable!
Search engine updated to v0.6.0 - now with over 1 billion indexed and available for searching pages and support for distributed architecture that will allow search engine to scale to many billions of pages, you can try it here! A brief description of Majestic-12 search technology is available here.
05/03/06 MJ12node v1.2.5
New version of MJ12node is released with a number of fixes and new support for alternative upload
locations (managed automatically so you don't need to worry), this should increase capacity of the server and also improve quality of uploads, please get your version updated now! Also FreeBSD port of the node has been made.
26/02/06 Search engine updated to v0.5.0 - now with over 619 mln indexed pages!
This release trippled number of indexed pages to reach over 619 mln pages!
01/02/06 MJ12node v1.2.4
New version of MJ12node is released with important fix in internal ban list logic, please get your version updated now!
31/01/06 Search engine updated to v0.4.0 - now with over 205 mln pages!
Delivering on promise to increase pace of search engine updates we deliver yet another doubling of index from 100 mln to over 205 mln pages! This release features considerably improvements to quality of data and relevancy of searches, and this is just a start as a lot more will come in February!
20/01/06 MJ12node v1.2.3
New version of MJ12node is released with important addition to internal ban list that makes it really imperative
up get your version updated now!
15/01/06 Search engine updated to v0.3.1 - now with 100 mln pages!
Significant update to the search engine with 100 mln+ pages indexed, support for phrase matching,
boolean NOT logic and geo-targeting (giving higher scores to local websites for a given user). This is the first of the many updates that will follow in the coming months!
24/12/05 Merry Christmas + MJ12node v1.2.2
Majestic-12 wishes very Merry Christmas and Happy New Year to all project participants across the world and takes this opportunity to announce
that new version of MJ12node is released with yet another (but hopefully last) important robots.txt bug fixed as well as timeout reduction logic that should improve quality of crawling, get it now!
10/12/05 MJ12node v1.2.1
New version of MJ12node is released with important robots.txt bug fixed as well as
more CPU friendly archiving (see Options->Archiving parameter that is also present in profiles) -
get it now!
1/12/05 850 mln URLs + MJ12node v1.2.0
A huge monstrous load of 850+ mln URLs is now in the database! To help cope with the data more efficiently new version of MJ12node
now released with many bugs fixes and new content analysis that greatly reduces barrels -
its a must have so please get it now! Development focus now shifts firmly towards scaling search engine up
to at least 1 bln pages. Next major update is expected within two weeks.
05/11/05 Article: Indexing good content - not junk
Indexing good content - not junk is important article on future direction of Majestic-12's indexing designed to improve relevance while cutting down data sizes considerably.
01/11/05 Biggest URL load to date: +700 mln urls!
The biggest URL load is finally over with 700 mln extra unique URLs added taking known URLs database to almost 3 bln!
29/10/05 MJ12node v1.1.4 - critical bug-fix
Yet another version of MJ12node released with critical bug fix in links parsing code - please upgrade
28/09/05 2 bln crawled URLs
Another 1,000,000,000 crawled URLs! It only less than 2 months to hit 2nd bln - the 1st bln took whole 8 months of crawling!
26/10/05 MJ12node v1.1.3
New version of MJ12node released with important bug fix and considerably speed up in parsing of links. It is recommended to upgrade ASAP.
19/10/05 300+ mln URLs added: over 2 bln now known! (+ MJ12node v1.1.2)
Even more URLs added to the system taking known URLs to over 2,100,000,000 unique URLs! This URL load is special because for the first time central server did not use any CPU to parse for URLs and ~3 TB (TeraBytes!) of data were parsed exclusively by distributed nodes!
MJ12node v1.1.2 was released to address rare but fatal for uploads bug that may result in node stopping its uploads. It is recommended to upgrade ASAP.
15/10/05 MJ12node v1.1.1 + Teams
New version of MJ12node released with important bug fixes, new compression and adjustment
in links parsing code to generate more URLs. You can either download it manually or use new
semi-automatic upgrade function that will notify you about new version shortly or after you ask node to check for it in Tools menu. Details
on changes in this release are here.
Statistics now support teams - click here for more information on how to create or join existing team.
07/10/05 MJ12node v1.1.0
Brand new version of MJ12node released to celebrate 1 year of work on this project! Many big
features including parsing of links from crawled data, checking for version and semi-automatic upgrade, better reporting of operational parameters,
smaller memory and CPU usage and more! It is recommended to upgrade even though some small errors
may have slipped in this release so stay tuned for possible bug fixes in the next few days!
01/10/05 HTML parser source code released + Wall papers
As it was promised a while ago Majestic-12 releases source code of high performance HTML parser. This .NET C# code that is used to process terabytes of HTML is now available under BSD license - try to make it run faster if you can!
Codepic took initiative in making these great Majestic-12 wallpaper, thanks! :)
30/09/05 370 mln URLs added
Just over two weeks from the last URL load more data had to be parsed to catch up with the new high crawling levels with 370 mln new unique URLs added to the system taking number of known URLs to almost 1,800,000,000 unique URLs!
27/09/05 20 mln URLs crawled today!
Today the community set new record worthy of remembering -- 20 mln URLs crawled representing whopping 433 Gigabytes (almost half a terabyte!) of raw data in less than a day! Screenshot of those who made the record is here.
26/09/05 Search engine updated to v0.2.0
Significant update to the search engine fixing a number of bugs and introducing new ability for users to create their own ranking formulaes and see how they change search results!
20/09/05 MJ12node v1.0.8
New version of MJ12node released with bug fixes, better URL scheduling strategy that should
minimise number of buckets with 1 domains and lots of URLs left, and a new feature allowing to compress crawled data 15-30% better (not enabled by default, see Options->Crawler->Enable barrel sorting). Details on what's changed
12/09/05 270 mln URLs added
Over 1.5 TB (TeraBytes!) of crawled data was parsed for new URLs, and after deduplication and filtering just over 270 mln more URLs
were added to the system taking number of known URLs to over 1,400,000,000 (for those who get dizzy from all those zeroes -- that's 1.4 billion!)
5/09/05 1 bln crawled URLs + MJ12node v1.0.7
Today we reached a major milestone of 1,000,000,000 crawled URLs! It only took 8 months from public beta of the crawler, and next billion should take a lot less time! :)
New version of MJ12node released with lots of changes making it a worthy upgrade. Details on what's changed
29/08/05 MJ12node v1.0.6 + more URLs + Firefox search plug-in!
New version of MJ12node released with an number of bug-fixes and one important change that calls for speedy upgrade of your clients, please get updated versions as soon as you can!
115 mlns more URLs were addded to the system taking number of known URLs to almost 1,150,000,000.
Anyone with Firefox (and why shouldn't you use the best browser available?) can now add Majestic-12 search engine plug-in to the list of search engines, don't be shy to use it to search -- this will help improve the search engine!
28/08/05 Stats upgraded!
Upgraded stats on this site now show breakdown by countries, platforms, best daily rates, more per peer stats and instructions on what HTML you need to use if you want to put some of your personal stats onto your webpages (more ideas in this area particularly welcome). Best place to start exploring new stats is from here.
16/08/05 Search engine updated to v0.1.5 + 1 bln URLs known!
Significant update to the search engine now with almost 45 mln URLs, some bugs fixed and new features such as domain clustering (site: prefix works now). This release will serve as the basis for work to improve relevance of search matches. The search engine is now hosted on a dedicated box so response time should be more reasonable now.
Separately a significant milestone of over 1 bln known URLs have been reached! In light of considerably higher crawling rates than ever before a special effort will take place to ensure that there is enough supply of URLs to crawl without having to panic and add them at the last minute! :)
27/07/05 Talk at Birmingham's UK Perl Mongers group
Presentation of talk before Birmingham Perl Mongers User Group (UK) on topic of "Building a scalable distributed WWW search engine ... NOT in Perl!" (requires PowerPoint). :)
Distributed Network Stats
Top 12 users (Today)
|Total (82 users)||5,046,030,900||149,288,837|
Customise!Top 12 teams (Today)
|1||Pirate Party International.||1,467,267,914||45,372,756|
|7||Hungarian Geek Alliance||103,344,315||2,880,388|
|8||Dutch Power Cows||35,386,302||1,014,924|
|10||X Grubbers Kick Ass||27,815,862||841,881|
|Total (32 teams)||4,609,113,158||134,717,960|
Top 12 users (Overall)
|Total (3611 users)||8,917,121,008,978||240,248,728,109|
Top 12 teams (Overall)
|2||Pirate Party International.||1,118,771,840,730||32,353,656,989|
|12||Hungarian Geek Alliance||111,968,146,575||3,074,906,068|
|Total (124 teams)||7,549,781,139,954||186,356,938,826|
Join the pioneers here!
Last updated: 19 Apr 2018 15:58:20:503 GMT