How Much Bandwidth Do Search Engines Take Up?
There are an awful lot of search engines out there and they all try to index as much of the web as they can, as quickly as they can. For this site, search engines seem to cause more traffic than anything else:
| Top 20 of 720 Total User Agents | |||
|---|---|---|---|
| # | Hits | User Agent | |
| 1 | 6866 | 12.63% | msnbot/1.0 (+http://search.msn.com/msnbot.htm) |
| 2 | 5600 | 10.30% | Googlebot/2.1 (+http://www.google.com/bot.html) |
| 3 | 3546 | 6.52% | Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/ |
| 4 | 2633 | 4.84% | Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com |
| 5 | 1706 | 3.14% | sna-0.0.1 mikemuzio@msn.com |
| 6 | 1120 | 2.06% | Mozilla/5.0 (compatible; BecomeBot/2.3; MSIE 6.0 compatible; |
| 7 | 1103 | 2.03% | NetNewsWire/2.0 (Mac OS X; http://ranchero.com/netnewswire/) |
| 8 | 806 | 1.48% | Krell-GeoScraper/0.1 libwww-perl/5.79 |
| 9 | 754 | 1.39% | aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com) |
| 10 | 750 | 1.38% | Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) |
| 11 | 746 | 1.37% | Planet HUMBUG +http://planet.humbug.org.au/ Planet/1.0~pre1 + |
| 12 | 745 | 1.37% | Planet Linux Australia http://planet.linux.org.au Planet/0.2 |
| 13 | 744 | 1.37% | Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET |
| 14 | 726 | 1.34% | Planet Apache +Unconfigured Planet Planet/1.0~pre1 +http://ww |
| 15 | 702 | 1.29% | NewsGatorOnline/2.0 (http://www.newsgator.com; 1 subscribers) |
| 16 | 673 | 1.24% | Mozilla/4.0 compatible ZyBorg/1.0 (wn-14.zyborg@looksmart.net |
| 17 | 655 | 1.21% | NetNewsWire/2.0b45 (Mac OS X; http://ranchero.com/netnewswire |
| 18 | 648 | 1.19% | Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gec |
| 19 | 621 | 1.14% | NewsGator/2.0 (http://www.newsgator.com; Microsoft Windows NT |
| 20 | 601 | 1.11% | Mozilla/5.0 (compatible; BecomeBot/1.86; MSIE 6.0 compatible; |
Now admittedly a lot of those hits will result in Not Modified responses but still, when you expand this to every site on the internet, that’s a lot of HTTP requests being fired around.
It also shows the power of syndication – almost no one actually reads this site directly (except the bots), yet lots of people link to, comment on or mention my posts (I’m certainly no a, b or even c list blogger though). Keeps the bandwidth requirements down and still get the message out – can’t complain about that.

June 6th, 2005 at 8:30 pm
Similar statistics on my site (funny, that; it’s the same server ;-) ). Incidentally some of the hits (I’m looking at you, Yahoo Slurp!) are for the same documents over and over – and, largely in my case, they return 404 for a damned good reason, i.e. they don’t exist.