How Much Bandwidth Do Search Engines Take Up?

June 5th, 2005

There are an awful lot of search engines out there and they all try to index as much of the web as they can, as quickly as they can.  For this site, search engines seem to cause more traffic than anything else:

Top 20 of 720 Total User Agents
# Hits User Agent
1 6866 12.63% msnbot/1.0 (+http://search.msn.com/msnbot.htm)
2 5600 10.30% Googlebot/2.1 (+http://www.google.com/bot.html)
3 3546 6.52% Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/
4 2633 4.84% Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com
5 1706 3.14% sna-0.0.1 mikemuzio@msn.com
6 1120 2.06% Mozilla/5.0 (compatible; BecomeBot/2.3; MSIE 6.0 compatible;
7 1103 2.03% NetNewsWire/2.0 (Mac OS X; http://ranchero.com/netnewswire/)
8 806 1.48% Krell-GeoScraper/0.1 libwww-perl/5.79
9 754 1.39% aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)
10 750 1.38% Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
11 746 1.37% Planet HUMBUG +http://planet.humbug.org.au/ Planet/1.0~pre1 +
12 745 1.37% Planet Linux Australia http://planet.linux.org.au Planet/0.2
13 744 1.37% Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
14 726 1.34% Planet Apache +Unconfigured Planet Planet/1.0~pre1 +http://ww
15 702 1.29% NewsGatorOnline/2.0 (http://www.newsgator.com; 1 subscribers)
16 673 1.24% Mozilla/4.0 compatible ZyBorg/1.0 (wn-14.zyborg@looksmart.net
17 655 1.21% NetNewsWire/2.0b45 (Mac OS X; http://ranchero.com/netnewswire
18 648 1.19% Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gec
19 621 1.14% NewsGator/2.0 (http://www.newsgator.com; Microsoft Windows NT
20 601 1.11% Mozilla/5.0 (compatible; BecomeBot/1.86; MSIE 6.0 compatible;

Now admittedly a lot of those hits will result in Not Modified responses but still, when you expand this to every site on the internet, that’s a lot of HTTP requests being fired around.

It also shows the power of syndication - almost no one actually reads this site directly (except the bots), yet lots of people link to, comment on or mention my posts (I’m certainly no a, b or even c list blogger though).  Keeps the bandwidth requirements down and still get the message out - can’t complain about that.