Brad's Books and Organizations

Books

Books

Organizations

Organizations

Hi, I’m Brad Feld, a managing director at the Foundry Group who lives in Boulder, Colorado. I invest in software and Internet companies around the US, run marathons and read a lot.

« swipe left for tags/categories

swipe right to go back »

Your Analytics Data Is Very Wrong

Comments (15)

I’ve written about this in the past so I expect this is nothing new for you, my dear reader.  The title summarizes everything I am going to say.

My buddy Fred Wilson had a comScore chart about Delicious’ growth (or lack of growth) in his hugely popular We Need A New Path To Liquidity post.  He used this data to make the point that web companies are languishing under the ownership of their acquirers when they get bought relatively early in their life.

The founder of Delicious – Joshua Schachter – disagreed with Fred’s conclusion on Delicious and Fred wrote an updated post titled Delicious where he corrects his assertion and asks the (probably rhetorical) question "I wonder how many other web apps are accessed via third party services (twitter’s traffic is largely through its api)? And if that’s a growing trend, then what does that mean for our ability to measure audiences, traffic, and growth from a distance?"

I’ve been a web analytics junky since my first ever angel investment – $25k in NetGenesis (it was net.Genesis at the time) – back when a "web log" was an uncomfortable thing to ponder.  I’ve watched, used, and invested in several generations of web analytics companies.  I am comfortable making the statement that "whenever one becomes a dominant analytics platform, it immediately starts to decline in accuracy."

While the graphs and tables might be pretty and are almost always used by the "leaders" to assert their "leadership", they distort and misrepresent what’s really going on.  When comScore first published their Widget Metrix in 2007, Om Malik correctly compared it to a Jellybean Contest.  I’ve yet to meet a widget report that is remotely accurate based on my inductive reasoning (e.g. so far I’ve been able to come up with at least two widget providers in the top 10 of any list that is missing from any list that I’ve seen.)

Now, I don’t mean to pick on comScore.  I’ll pick on a friend.  My FeedBurner reader data shows that I have 117k readers (or subscribers) to my Feld Thoughts blog.  While I’m flattered, this is bullshit.  When I dig into the actual user agent data, I find that 98,966 come from a Feed Reader called BlogRovR.  I happen to know that BlogRovR is what used to be called Activeweave Stickies, which is a company I looked at 18 months ago.  They "autosubscribe my feed" whenever someone installs BlogRovR (which means my subscriber count is inflated by around 99k – I imagine some of the BlogRovR people look at my feed, but certainly not 99k of them.  Do the math.)  Oh – everyone else that is autosubscribed to BlogRovR (A VC, TechCrunch, …) has the same subscriber count inflation.

While it makes me feel all warm inside that I have the number 117k visibly displayed on my blog and I show up on as #9 on Rating Burner, this is just a very personal example of why "your analytics data is very wrong."

At some level, there isn’t anything wrong with the analytics data being wrong (or inaccurate) – that’s the nature of the beast and why anyone that uses analytics data to figure stuff out should use multiple sources to generate their own analysis.  However, I’m regularly amazed by how many conclusions are derived from data sets that have known, fundamental flaws.

As always, check your assumptions.

  • Manav Misra

    Brad, I usually follow your blog through my Google sidebar webclips service (unless I want to follow a link or post a comment in which case I actually open the post in a browser). While I subscribe to various blogs that way, I don't necessarily read all postings. Do you know how these kinds of subscriptions get counted–as one subscription, or are individual page views counted, or do the reporting services give you reports of both?

    • http://www.feld.com Brad Feld

      I’m not 100% sure how FeedBurner is counting the Google webclips service, but I think it gets lumped in with the Google Feedfetcher agent. If this is the case, you’ll be counted as one additional subscriber. I think your page views will only get counted when you open the page in a browser.

      • http://www.b5media.com Jeremy Wright

        Yeah, it's a single agent for everything Google does. iGoogle, Reader, everything.

  • http://www.b5media.com Jeremy Wright

    Brad: Analytics have always, and likely will always, be useful mostly as an internal measurement tool (ie: how much am I growing month over month). I've never believed in public analytics companies (except for the largest sites – and even then only in the % of market share – not in the specific numbers, ie: uniques, pages, etc).

    That said, when doing research, it is possible to draw some conclusions if you compare site A to site B on a decent system like comScore, Quantcast, etc. You won't know specific pageviews of Site B, but you'll likely know if they're a similar size, an order of magnitude higher or an order of magnitude lower.

    And, when part of your business is acquisitions, that quick checkpoint is a useful piece of information :)

  • Marc A. Meyer

    Hi Brad,
    Just a touch of clarification:

    There's a bit more complexity to how the BlogRovR statistics come to be.
    When a new user registers and downloads BlogRovR thery're offered bundles of blogs in various areas thet may be interested in, to start them off. Feld Thought's is in the tech bundle, cause we read you and we're always enlightened by your musings.

    For those who accept the default bundle EXPLICITLY, they're counted as a subscriber. Now, that is what might happen in google reader as well. But in reader, they might well never return and read his blog. How many feeds in your feedreader to you never actually read? That's a form of “inflated” count as well.

    What's really novel and cool about BlogRovR, it's sole reason for existing, is that whenever a RovR user visits any page on the web that Brad has written ABOUT, or the page of some other notable blogger who also is writing about something you, Brad, have written about, they get to see your post immediately, right there! I think this makes them an even more engaged and valid reader than all but the small minority of readers who regularly consume all the content on your site.

    So, I don't agree that our stat is “very wrong.”. It is certainly a bit “different;” its not exactly comparing apples and oranges, but RSS and similar technologies lead to novel forms of consumption which aren't always readily capture by the simple “subscriber” concept of a newspaper.

    Are the wildflowers out yet, Brad? I'm pining for for a long hike out there.

    Cheers,
    Marc Meyer
    CEO, Activeweave BlogRovR

  • http://www.marypurslow.com/ mary

    I couldn't agree more with your conclusion to this thought about how many people draw conclusions that can severely impact their business from know flawed data. This might sound mean but if those that are knowingly (even unintentionally) making decisions and following these numbers 100% is better for those of us who diversify our sources, run the comparisons and than give or take a little %age wise. There are plenty of sources (some which you have listed) that can help focus our decision making. Great posts you could have definitely gone into more detail but good food for thought.

  • http://www.blogrovr.com Jean Sini

    Hi Brad,

    great post. Jean Sini, BlogRovR founder and CTO, here. Aside from the specific aspects of how BlogRovR counts subscribers (and that Marc highlighted above), the general issue you point to is attention, and that's certainly something I find isn't well measured.

    I don't know that RovR is any worse an offender than any other reader but, more broadly, I think, like you, that along with page views, subscriber count isn't doing a great job of capturing attention.

    Unfortunately, the notion of knowing exactly who's reading you, when, and how (are they scanning your post in half a second, Scoble-style, or getting to enjoy the finer points, deep down in that fourth paragraph?) is very much an elusive one.

    This has been said before: we need a better metric (don't we always?); at the same time, getting there might imply a lot of behavior tracking, with all the privacy ramifications. Until then, we're stuck with some form of inflation factor to account for posts going unread in feed readers.

    Cheers.

    • http://www.feld.com Brad Feld

      Totally agree. Again, my issue is actually not with EITHER FeedBurner or BlogRovR – I was just using this as an easy to identify and explain example of the deeper problem that Fred pointed out in this Delicious / comScore numbers.

  • Todd Sampson

    Good stuff Brad.

    A lot of people (outside Yahoo!) made a big deal when it looked like MyBlogLog's stats were dropping on Compete.com. There is no way to say, “Hey, we were bought by Yahoo! TONS of our content is now being served off of the yahoo.com domain which we are not getting credit for.”

    I even did a post on our real stats, but MyBlogLog's Blog doesn't have 117k readers either. ;)

    http://mybloglogb.typepad.com/my_weblog/2008/02/2

    Todd Sampson
    Co-founder, MyBlogLog

  • http://intensedebate.com/people/todd_sampso5628 todd_sampso5628

    Good stuff Brad.

    A lot of people (outside Yahoo!) made a big deal when it looked like MyBlogLog's stats were dropping on Compete.com. There is no way to say, "Hey, we were bought by Yahoo! TONS of our content is now being served off of the yahoo.com domain which we are not getting credit for."

    I even did a post on our real stats, but MyBlogLog's Blog doesn't have 117k readers either. ;)

    http://mybloglogb.typepad.com/my_weblog/2008/02/2

    Todd Sampson
    Co-founder, MyBlogLog

  • http://jeremystein.net jeremy

    id like to see someone trace the flow of content across the web. then we can really begin to have proper analytics. something similar to tumblrs reblog feature that lets you see who has reblogged your content. it will be interesting to see the infectious nodes.

  • http://www.ioergercreative.com Roderick

    Feld, Great blog post, I wrote some awesome coverage of it for MarketingPilgrim. But during my review process I noticed you used profanity in you post, unfortunately Andy Beal doesn't allow coverage of any posts using profanity. If you ever chose to edit this post please let me know.

    • http://learntoduck.com micah

      well thats fucked.

    • http://www.feld.com Brad Feld

      Shit. Oh well. I’ll try to keep it cleaner the next time, although it goes against my nature.

  • http://www.bijansabet.com bijan

    interesting. but what I really want to know is how to do it I get included in blogrovR

  • http://www.livelovecoffee.com/ coffeeguy

    I saw the other day that comScore was proved inaccurate in its rating for Google, and showed a slowing of visitors. This caused a sharp drop in Google's stock that day, only to have the comments revised. Not too sure how accurate comScore is.

  • http://intensedebate.com/people/bfeld bfeld

    I’m not 100% sure how FeedBurner is counting the Google webclips service, but I think it gets lumped in with the Google Feedfetcher agent. If this is the case, you’ll be counted as one additional subscriber. I think your page views will only get counted when you open the page in a browser.

  • Jeremy Wright

    Yeah, it's a single agent for everything Google does. iGoogle, Reader, everything.

  • Jeremy Wright

    Brad: Analytics have always, and likely will always, be useful mostly as an internal measurement tool (ie: how much am I growing month over month). I've never believed in public analytics companies (except for the largest sites – and even then only in the % of market share – not in the specific numbers, ie: uniques, pages, etc).

    That said, when doing research, it is possible to draw some conclusions if you compare site A to site B on a decent system like comScore, Quantcast, etc. You won't know specific pageviews of Site B, but you'll likely know if they're a similar size, an order of magnitude higher or an order of magnitude lower.

    And, when part of your business is acquisitions, that quick checkpoint is a useful piece of information :)

  • jeremy

    id like to see someone trace the flow of content across the web. then we can really begin to have proper analytics. something similar to tumblrs reblog feature that lets you see who has reblogged your content. it will be interesting to see the infectious nodes.

  • mary

    I couldn't agree more with your conclusion to this thought about how many people draw conclusions that can severely impact their business from know flawed data. This might sound mean but if those that are knowingly (even unintentionally) making decisions and following these numbers 100% is better for those of us who diversify our sources, run the comparisons and than give or take a little %age wise. There are plenty of sources (some which you have listed) that can help focus our decision making. Great posts you could have definitely gone into more detail but good food for thought.

  • http://intensedebate.com/people/bfeld bfeld

    Totally agree. Again, my issue is actually not with EITHER FeedBurner or BlogRovR – I was just using this as an easy to identify and explain example of the deeper problem that Fred pointed out in this Delicious / comScore numbers.

  • Roderick

    Feld, Great blog post, I wrote some awesome coverage of it for MarketingPilgrim. But during my review process I noticed you used profanity in you post, unfortunately Andy Beal doesn't allow coverage of any posts using profanity. If you ever chose to edit this post please let me know.

  • http://intensedebate.com/people/marc_a_meye8949 marc_a_meye8949

    Hi Brad,
    Just a touch of clarification:

    There's a bit more complexity to how the BlogRovR statistics come to be.
    When a new user registers and downloads BlogRovR thery're offered bundles of blogs in various areas thet may be interested in, to start them off. Feld Thought's is in the tech bundle, cause we read you and we're always enlightened by your musings.

    For those who accept the default bundle EXPLICITLY, they're counted as a subscriber. Now, that is what might happen in google reader as well. But in reader, they might well never return and read his blog. How many feeds in your feedreader to you never actually read? That's a form of "inflated" count as well.

    What's really novel and cool about BlogRovR, it's sole reason for existing, is that whenever a RovR user visits any page on the web that Brad has written ABOUT, or the page of some other notable blogger who also is writing about something you, Brad, have written about, they get to see your post immediately, right there! I think this makes them an even more engaged and valid reader than all but the small minority of readers who regularly consume all the content on your site.

    So, I don't agree that our stat is "very wrong.". It is certainly a bit "different;" its not exactly comparing apples and oranges, but RSS and similar technologies lead to novel forms of consumption which aren't always readily capture by the simple "subscriber" concept of a newspaper.

    Are the wildflowers out yet, Brad? I'm pining for for a long hike out there.

    Cheers,
    Marc Meyer
    CEO, Activeweave BlogRovR

  • Jean Sini

    Hi Brad,

    great post. Jean Sini, BlogRovR founder and CTO, here. Aside from the specific aspects of how BlogRovR counts subscribers (and that Marc highlighted above), the general issue you point to is attention, and that's certainly something I find isn't well measured.

    I don't know that RovR is any worse an offender than any other reader but, more broadly, I think, like you, that along with page views, subscriber count isn't doing a great job of capturing attention.

    Unfortunately, the notion of knowing exactly who's reading you, when, and how (are they scanning your post in half a second, Scoble-style, or getting to enjoy the finer points, deep down in that fourth paragraph?) is very much an elusive one.

    This has been said before: we need a better metric (don't we always?); at the same time, getting there might imply a lot of behavior tracking, with all the privacy ramifications. Until then, we're stuck with some form of inflation factor to account for posts going unread in feed readers.

    Cheers.

  • Manav Misra

    Brad, I usually follow your blog through my Google sidebar webclips service (unless I want to follow a link or post a comment in which case I actually open the post in a browser). While I subscribe to various blogs that way, I don't necessarily read all postings. Do you know how these kinds of subscriptions get counted–as one subscription, or are individual page views counted, or do the reporting services give you reports of both?

  • http://intensedebate.com/people/micah105 micah105

    well thats fucked.

  • http://intensedebate.com/people/bfeld bfeld

    Shit. Oh well. I’ll try to keep it cleaner the next time, although it goes against my nature.

  • bijan

    interesting. but what I really want to know is how to do it I get included in blogrovR

    <g>

  • coffeeguy

    I saw the other day that comScore was proved inaccurate in its rating for Google, and showed a slowing of visitors. This caused a sharp drop in Google's stock that day, only to have the comments revised. Not too sure how accurate comScore is.

Build something great with me