Live Search Keeps Spamming My Site

I use two tools to gather usage statistics on my site. One is Analog, which is a simple web server log analysis tool provided by my web host, and I also subscribe to Google Analytics, which only registers a hit if a visitor’s browser executes a bit of JavaScript code embedded into my web pages.

My site does not attract that much traffic, but I get enough data to observe major trends and have a general feel of what’s going on. Over the few years that I have kept this site I have seen the share of Firefox climb to 40% of all visits and the share of Internet Explorer drop to just over 50%. I have seen the use of Linux growing steadily, now accounting for about 4% of all visitors, which is great but probably not a good estimate of the overall situation because of the specifics of my site.

Over the last year I have been looking at the the following discrepancy: while Google Analytics reports that over 80% of search queries come from Google, Analog reports that is by far the top referrer on my site, logging more than 10 times (!) more queries than . So after a while I decided to investigate, but it took a while for me to find a spare Friday to pull log files from the server and take a closer look.

Turns out, it’s Microsoft’s way to detect cloaking, or a technique used by spammers to manipulate search engines into linking to their pages by detecting crawlers and presenting them with content that is different from that presented to regular visitors.

With my spare Friday getting closer to bed time, here are a few thoughts. First, since these queries are detectable just as easily as the Live Bot’s, as they all come from Microsoft’s IP block, it cannot possibly be efficient, especially after being used for almost a year. Second, somehow Yahoo and Google have managed to deal with this problem without blanketing the Web with useless queries. And third, is cloaking so prevalent that it requires such drastic measures?

Leave a Reply