One of the segments of the blogsphere that is starting to appear is analytics. I was the initial investor in NetGenesis (now owned by SPSS) and Mobius Venture Capital was an investor in I/Pro and Andromedia (now part of Macromedia) back when we were SOFTBANK Venture Capital.
So – it should be no surprise that I’m fascinated with tools to help with analytics on my site. Simple hit counters – like Site Meter – are pretty useless because of the difference between hits and feeds. I stumbled upon Feedburner a few weeks ago
(May 8th to be exact) and have been watching as my stats build daily. I’ve learned a few interesting things, such as the most popular User Agent (Newsgator – followed closed by SharpReader) and my most popular click-through (On Being the CEO – Henry V and The Cover-up.) However, I couldn’t really figure out what the actual statistics were telling me.
So – I fired off the following message to the Feedburner guys:
I’m trying to understand the actual statistics.
As of today, I have 981 new visitors
Today – May 25 – I have 69 new visitors so far.
Are the 69 new visitors my TOTAL vistors today, or do I have 69 new vistors reading my site (in addition to the other (981 – 69) that were “total new visitors as of yesterday)?
Dick Costolo from Feedburner responded immediately.
I’ll forward you an email below that I sent to another publisher with a similar question. It is long and detailed, but should shed some light on what we’re really reporting to you. We are going to be updating the statistics page in the near future to provide significantly more transparency to what you are really seeing, now that we ourselves have a more robust handle on how the feeds are accessed by the multitude of clients.
I hope this helps and doesn’t just make things more confusing…again, we will be updating the stats page to provide much more transparency to these things in the very near future. Please feel free to follow-up with any other questions,
I’ll fill you in on the story here, you’re going to get a longer answer than you might have expected. I’m taking the time to write this all out to you myself because I also want to write something up for the FeedBurner blog, so apologies for the length! There are two things going on: a) the rss space is emergent and if you read my recent posts at www.burningdoor.com/feedburner, you’ll see that there are a huge number of feed readers and aggregators (over 300 different pieces of software that poll us for feeds with some frequency). An issue related to this fact is that some of these readers do not send the appropriate HTTP headers in their requests. Specifically, even if they have already requested the feed once, their future requests do not implement the “If-modified-since” http request. So, our new visitors measures the number of first-time http requests we’ve received for your feed, but depending on how many of your readers are using clients or aggregators that don’t correctly implement “if-modified-since” you may see inflated numbers to some unknown degree. b) You may also see “under reported” numbers of visitors for the following reason: it is considered a sort of courtesy a la the robots.txt file for feed aggregators like Bloglines and my.yahoo to include the number of subscribers that they are polling on behalf of. However, som of the aggregators don’t send us this number.
FINALLY, the short answer to your question is that your total number of visitors is probably something like 2/3 of your hits number. We are going to be doing a big overhaul of the stats pages in the next month or so, and that will give you a much better picture of your visitors numbers, as we will be looking to report directly to publishers the number of “likely” total visitors they have based on a formula of “requests from different ip addresses from well-behaved clients”+”any new request from a well behaved client”+rollup of total subscribers across all aggregators that are polling on behalf of multiple subscribers and not factoring in any new requests from our known list of misbehaving clients.
Clearly we’ve got a long way to go on the analytics front, but it’s encouraging that folks with Dick and his team are already thinking about the issues. This is very similar to the early NetGenesis days – we had lots of data, but didn’t really know how to make sense of it and – as the traffic increased – the quality of our tools and analysis followed.