Wednesday, July 11, 2007


A major problem for most global search engines is the simple fact that the net grows so rapidly that no matter how many serverfarms they build, pages are being created or updated faster than the search engines can detect and index them.
I recently came across Majestic that has a really interesting approach to this problem: Distributed crawlers. They've made a simple crawler-client that can help distribute the indexing among all the volunteers who provide spare bandwidth and computertime to this noble task in much the same way as some people donate time and bandwidth to the SETI@HOME project or my personal favourite, the search for the next Mersenne prime.
However, the idea with distributing the search seems really useful. Now, if only they had done something novel to the search-end instead of just copying Google I would have been thrilled. But I like the idea anyway. Check it out at

Oh yeah, while you're there, check out the C# source for their HTML Parser. It's awesome. Fast and furious!

No comments: