Widget Stats

Some of the companies I invest in are innovation factories – it feels to me like something interesting comes out of them at least once a week (and sometimes daily.)  FeedBurner and NewsGator both fit in this category – it’s a combination of having amazing developers, a core platform architecture, and a philosophy of iterating quickly on small product increments (rather than the “big bang – take 12 months and release a product – approach.”)  While the spirit of web-based apps encourages this, I’m amazed at the number of startups (including some that I’m an investor in) that have a rapid innovation cycle at the beginning of their development, but then slow down significantly as they add headcount and complexity in their application (more on this in another post.)  Hint to those of you that want to go faster – consider using an agile methodology and tooling.

One of the innovation factories that I’m an investor in is Lijit.  Stan James and Todd Vernon are obsessed with getting stuff out the door on a short cycle, testing it in the real world, and iterating.  Every time I turn around Stan shows me something amazing (Stan – when does the world get to see Bubbles?) – on Monday it was the first “known” (at least to me) widget stat report.  The methodology is described below:

These statistics are based on a crawl of 8552 blogs done over April 11-15, 2007. This crawl was “centered” on blogs with the Lijit widget (or as we call it, Wijit), and expanded outwards by following the blogrolls. Due to some bugs in how we stored this data, we had to throw out a large number of blogs. These stats only include numbers which we are sure to be correct. This is our first time, after all! For this sample we take a wide definition of widgets, including non-visible widgets such as as Google Analytics. Also note that we do not include image-only widgets, this includes the Feedburner subscriber count badge and the LinkedIn badge. Also not included are “widgets” that are automatically added by a blogging platform, like those from Blogger or Typepad. Because our crawl expanded outwards by blogrolls, and because of the general disconnect between the traditional blogosphere and social networks, widgets from sites like MySpace are not included in these stats.


During the first crawl and analysis, Stan clearly acknowledges what assumptions they made, what didn’t work on the crawl, and hints toward what they’ll be tuning with their algorithm. 

As a data junkie, this was intensely interesting to me.  The top 10 widgets on the crawl were Google Analytics, Google Syndication, Sitemeter, Technorati, Flickr, Statcounter, Mybloglog (damnit guys – why did you sell to Yahoo – we could have built something really big), FeedBurner, Blogrolling, and icio.us (del.icio.us).  The next 30 are an interesting collection of things to pay attention to (and a few surprised me with both their relative popularity or their relative unpopularity.)

While this is still a small sample size (8552 blogs) with built in bias (crawl centered on blogs with the Lijit Wijit), it’s a fascinating start.  Analytics dominate with Search relatively light.  The power curve (or long tail for those of you that don’t like math) is front and center in the analysis.

You should assume that there’s a method to the madness with this type of crawl (rather than just a desire to make a set of pretty graphs and satisfy my need for data.)  That’s part of a really interesting thing that you’ll see soon from Lijit.