Month: November 2005
David Baldacci is in my regular “mental floss” book rotation – whenever one of his books comes out it immediately ends up in the “read when you need a break from serious stuff” pile. His latest – The Camel Club – didn’t disappoint me. It started off slower than most Baldacci books and it took me 100 pages to get completely sucked in. When I finished, I realized that he needed more time to set up all the characters in this one, as he had a large number of interconnected plots. My evening was sacrificed to the reading gods as I downed 300 pages of riveting storytelling. Spies, conspiracies, government corruption, secret societies, old CIA training facilities, presidential kidnapping, secret service heroics, many bad guys (some Americans), a bunch of scary dudes running the country including a few that thought nuclear war was a solution, computer hacks, meglomanics, near miss surveillance incidents, characters with major pasts coming back to haunt them, OCD, a little romance, and excellent gunplay made for a good evening.
Lyle Lovett’s “It Ought To Be Easier” was the 195th song on Amy’s iPod Shuffle playlist that I manually recreated for her this morning.
Someone at Apple on the iPod user experience crew needs to go to the Apple Store in Palo Alto, buy two shuffles, three iPods of different flavors, and a G5. Then, they need to go to Fry’s and pick up two desktop PCs, a server, two laptops, and a wireless router (I assume all the machines have wireless cards in them.) Now – set it all up, rip all your music, store it on the PC server (or Mac server, I don’t care), point all your iTunes clients at the right directory on your server, and make a bunch of playlists. Yeah – that was fun, wasn’t it.
Now, for shits and giggles, synchronize your server music file on one of your laptops (c’mon, it’s not that hard to figure out how to do it, but you will need new and exciting software.) Make a playlist on that laptop (using iTunes of course), associate a Shuffle with it, and copy the playlist to your Shuffle. Now, give your laptop to someone else (e.g. your IT guy because you got a new laptop – just pretend). Try – just try – to get the Shuffle to automatically associate with another computer without wiping out the playlist.
Now, stare at your orphaned Shuffle for a while. Your wife – who you love very much – just wants to make a few changes to it. Try to explain to her why it’s not that easy (since nothing recognizes what’s on the Shuffle). Watch as she looks at you as though you are a total freak of nature. Continue to watch as she starts to scream and then cry. Seriously.
You – like me – will become determined to figure out how to get this damn playlist off the orphaned Shuffle and into a copy of iTunes so your wife can be in music change happy land. Anapod looked like it would work, but after spending $30 on it, it turns out that it doesn’t handle playlists (and recreating them in iTunes) with Shuffle’s very well (it works very nicely with the other iPod versions.) Thankfully, Anapod can figure out what’s on the Shuffle and gives you a nice list of it. Try to figure out how to print this list. You’ll eventually give up and resort to opening a Word document, hitting Shift-PrtSc, and Ctrl-V into the Word doc, followed by PgDn in Anapod, Shift-PrtSc, … until you’ve got a nice document of your screen shots (5 pages to get the full 250 or so songs in the play list). Then – print (to your network printer of course).
Hold your breath. Plug the Shuffle back into your wife’s desktop computer (I’m not going to make that laptop mistake again – this is her computer). Fire up iTunes. Song by song, recreate her playlist. After 30 minutes of this, walk outside and scream at the top of your lungs (or – if you have neighbors – go in your garage and scream).
Fucking stupid. There’s got to be a better way. Apple has mastered the user experience. These are not the droids you want. Oh – and don’t bother trying to explain this to your wife – just smile and show her how to update her playlist on her desktop computer.
I had lunch with Stan James on Friday at Pasquini’s Pizzeria. Stan is the creator of Outfoxed and was introduced to me by Seth Goldstein who is one of the guys behind AttentionTrust.org and has recently launched Root Markets (Seth has a long essay up about Root Markets: Media Futures: From Theory to Practice that is very interesting (and complex) if you are into this stuff.))
Stan’s moving to Boulder to be in the middle of the Internet software development universe (ok – he’s moving back here because it’s a much better place to live than Silicon Valley, but don’t tell anyone). We spent a bunch of time getting to know each other, we talked about the research he’d been doing for his masters these in Cognitive Science at the University of Osnabrueck, and how this led to Outfoxed. Oh – and we ate a huge delicious pizza.
I’d been playing with Outfoxed for a few days on my computer at home (I have a computer at home that I’ll install anything on) and was sort of getting it. An hour with Stan helped a lot. When I combine what Outfoxed is figuring out for me with the data I’m getting from Root’s Vault (my clickstream / attention data) I can see how this could be really useful to me in a few weeks once I’ve got enough data built up. More in a few weeks.
We then started talking about something I’ve been thinking about for a while. My first business was a software consulting business that built database application software. As a result, the construct of a relational database was central to everything I did for a number of years. In the mid 1990’s when I started doing web stuff, I was amazed at how little most people working on web and Internet software really understood about relationship databases. This has obviously changed (and improved) while evolving rapidly as a result of the semantic web, XML, and other data exchange approaches. But – this shit got too complicated for me. Then Google entered the collective consciousness and put a very simple UI in front of all of this for search, eliminating the need for most of humanity to learn how to use a SELECT statement (ok – others – like the World Wide Web Wanderer by Matthew Gray (net.Genesis) and Yahoo did it first – but Google was the tipping point.)
I started noticing something about a year ago – the web was becoming massively denormalized. If you know anything about relationship databases, you know that sometimes you have denormalized data to improve performance (usually because of a constraint of your underlying DBMS) but usually you want to try to keep you database normalized. If you don’t know about databases, just think denormalization=bad. As a result of the proliferation of user-generated content (and the ease at which is was created), services where appearing all of the place to capture that same data (reviews: books, movies, restaurants), people, jobs, stuff for sale. “Smart” people were putting the data in multiple places (systems) – really smart people were writing software to automate this process.
Voila – the web is now a massively denormalized database. I’m not sure if that’s good or bad (in the case of the web, denormalization does not necessarily equal bad). However, I think it’s a construct that is worth pondering as the amount of denormalized data appears to me to be increasing geometrically right now.
Stan and I talked about this for a while and he taught me a few things. Stan is going to be a huge net-add to the Boulder software community – I’m looking forward to spending more time with him.
Nope – I’m not referring to an HTTP 404 error. I’m talking about Sarbanes-Oxley (SOX) Section 404. If you are a software developer and don’t know what SOX is, it’s the law enacted after all the corporate accounting scandals in the early 2000’s that was intended to increase the accuracy and reliability of corporate disclosure. There are many opinions on SOX, including the widely held opinion among the VC community that SOX has been instrumental in discouraging companies from going public. Love or hate it, we’ve all got to live with it.
Niel Robertson – the CTO of Newmerix – has written a good overview article on SOX 404 and what it means to the IT organization and the software developer. Distilling the government regulations and accounting stuff into a short article that is relevant and useful is non-trivial – Niel did a nice job of it. If you are involved in creating, managing, or deploying software that impacts financial systems, it’s worth a read.
I woke up this morning thinking about User Agents (ok – I was also thinking about Naomi Watts and Sean Penn who were amazing in 21 Grams.) A commenter on my Personalize Feed post pointed out that most of the big online aggregators include subscriber counts in their user-agent headers when the aggregator polls the RSS feed.
While I agree that using the User Agent to report the number of subscribers to a feed is a good approach, I’ve noticed a bizarre pattern lately. Since I use FeedBurner to manage my feed, I’ve got great visibility into which aggregators are providing their subscriber numbers. I’ve got enough subscribers at this point (> 4500) that the law of large numbers are working for me – while I won’t claim to have perfect “aggregator market share data”, I’ve got a pretty good feel for it (which is really useful given my investment in NewsGator.) As I wrote in my Blog Analytics post, I love numbers and have long invested in, benefited from, and paid attention to web analytics. So – the number of subscribers to my feed (among other feed and blog related data) are near and dear to my heart.
However, it turns out that a number of aggregators don’t report subscriber count. I’ve got 128 distinct aggregators polling my feed on a daily basis according to FeedBurner. 76 of them have one subscriber which most likely means they don’t report the number of subscribers via User Agent (yeah – some of them probably only have one subscriber, but not all of them.) The top 10 aggregators polling my feed (in order) are Bloglines, NewsGator Online, FeedDemon, NetNewsWire, Firefox Live Bookmarks, Rojo, Google Desktop, SharpReader, NewsGator Enterprise, and Thunderbird.
Ok – that’s sort of interesting – but more interesting is the number of aggregators that don’t report number of subscribers. Google Reader doesn’t. Microsoft is noticeably missing (and has a User Agent is lodged down in the subscriber = 1 category). And – the grand daddy of them all (My Yahoo) – after reporting a number that landed them regularly in my top four, recently stopped reporting number of subscribers and now shows one. Others in the subscriber = 1 list that stand out include OddPost (Yahoo again), Pluck Firefox and Web Edition (ironically the Pluck Internet Explorer Edition reports, just not in the top 10, although that might be because it’s not an online reader), Pubsub, and SearchFox.
Since a web-based aggregator only needs to request the feed once and then uses that cached version for all its users, putting the subscriber count in the User Agent is a good citizen move to simply help the publisher of the feed (e.g. me – I want to know the number of subscribers I have.) I’ve got to believe that it’s useful to the aggregator to report the number of subscribers since it helps the publisher understand how popular the aggregator is and publishers will try to promote the services where they have a lot of subscribers. Plus – as a publisher – while I don’t know the demographics of my subscribers, at least I’d like to know how many I have. Given how trivial it is for the aggregator to report this, it baffles me why some of the aggregators, including Google, Microsoft, and Yahoo, don’t do this.
Of course, there are issues with the simply reporting the aggregator count. While I get an aggregate subscriber count, I don’t get any real information about the number of subscribers that are actively reading my blog via an aggregator. Also, people don’t tend to delete their user-ids at online aggregators if they don’t use the service anymore (e.g. I know I have a Kinja account – remember Kinja – they don’t report subscribers either – with feeds in it that I imagine happily pools stuff daily for me). Finally, some services subscribe a bunch of feeds automatically skewing the number of subscribers for that aggregator.
That said, the User Agent approach is a good one that I figured was worth decomposing more based on the comment I got suggesting most of the big online aggregators support subscription counts. While a lot of aggregators are playing nice (at least 33% of the ones that poll my feed), a bunch aren’t, including Microsoft, Google, and Yahoo. What’s up with that guys?
Have you ever made an error as a result of a formula problem in a spreadsheet. Nope – I haven’t either <g>. A long time friend and uber-accounting-dude Frank Lincks sent me this somewhat painful story of Eastman Kodak’s recent SEC non-reliance 8–K filing due to a spreadsheet error. Quoting from the 8–K:
“…the Company has concluded that the severance error that occurred in the second quarter, as described above, was primarily the result of a failure in the operation of … the existing preventive and detective controls surrounding the preparation and review of spreadsheets that include new or changed formulas. The Company has concluded that this situation constitutes a “material weakness,” as defined by the Public Company Accounting Oversight Board’s Auditing Standard No. 2. The Company believes that this material weakness will be remediated by December 31, 2005.”
I was in an Oxlo board meeting recently and Todd Vernon (Raindance CTO who is on the Oxlo board) said “I’ve subscribed to your del.icio.us tag feed – I like it more than your blog because it tells me what you are thinking about. I was mulling this over when a few days later I saw Jason Calacanis’ Wired & Tired post where he called out “subscribing to Fred’s blog” as Tired but “subscribing to Fred’s del.icio.us feed” as Wired. Jason used almost the same phrase as Todd – “I prefer to read what Fred is *considering* blogging about.”
It dawned on me that my del.icio.us tag is akin to the first derivative of what I’m thinking about. For a while, I struggled with tagging stuff as I didn’t get the benefit. Now, I tag everything I read on line that I find interesting and am finding myself referencing my del.icio.us tags regularly to find stuff that I’d tagged a few days earlier (rather than going to Google and doing a new search – hmmm.)
I went ahead and FeedBurnered my del.icio.us tag feed so it’d be easy to subscribe to. FeedBurner’s BuzzBoost feature made it easy for me to put the most recent tags up on my blog (on the left hand column – I chose to list the last 10). del.icio.us’s new Tagroll let me quickly put a nifty tag cloud up on my blog also (on the right hand column).
As an investor in a number of companies that do stuff with RSS (NewsGator, FeedBurner, Technorati, and Judy’s Book) and fan and active user of others (e.g. del.icio.us, FeedBlitz, SixApart) I’ve been seeing a lot of “this sucks, that’s great, that sucks, this is great” blog posts lately, but rarely do I see anyone decompose what’s actually bad or great and explaining why. Occasionally there’s some stuff from an end-user perspective (especially whenever Google rolls something out), but I’ve been surprised by the general lack of technical depth and public debate. Ok – maybe I’m reading the wrong feeds – but I’m trying.
While I’m a nerd, I’m on the investor side of the equation instead of the engineer side of the equation. As a result, I’m always looking for the analog of the thing I’m experiencing – “what – in the past – was like this thing that is now happening that can provide insight into what the future is going to be like?” I spend a lot of time thinking about this with regard to RSS (and blogs, user-generated content, online advertising, content organization, search, tools, platforms, trendy buzzwords to try to describe everything, and a preponderance of VC investors diving into an area just to get bets on the table.) I don’t pretend I necessarily have a clue technically (ok – I pretend, but I don’t have a clue) – but I know enough to be able to play around with things, look at what I think is going on under the hood, and make (at the minimum) provocative suggestions (often wrong, but at least provocative) about what I think is happening.
Personalized RSS feeds is one of the issues that hit me in the face recently. In the past few weeks, I’ve subscribed to a few RSS feeds that were personalized just for me. Specifically, when I subscribed, the URL that ended up in FeedDemon / NGOS (the aggregator that I use) had a unique identifier at the end. If I subscribed a second time (pretending I was a different person), I got a different unique identifier and ended up with two feeds. This is distinct from a feed that I’ve customized such as a delicious tag feed that is still a generic feed that presumably multiple people will subscribe to if they use the same parameters that I do.
Now – I believe that RSS feeds that are personalized for a particular subscriber’s preferences will become an important tool in the content syndication world, just as static html gave way to CGI, cookies appeared, or broadcast opt-in email (Dear Sir:) evolved into narrowcast (Dear Brad:). However, I think the early attempts at brute force personalization by assigning unique feed URLs as a means of tracking subscribers can cause several problems.
- Web-based aggregator aren’t going to put up with having 10,000 feeds in their database that are essentially the same feed. This places an undue burden of polling and synching on the aggregator, it’s inefficient, and of course, many of the aggregator will ultimately collapse these into a “single” feed. It’s fundamentally inefficient for the publisher for exactly the same reasons. A year ago, there was a lot of noise about “overpolling of RSS” (e.g. aggregator that polled every minute). Most aggregrators have addressed this issue, but the personalized feed phenomenon could start this issue back up.
- This approach breaks OPML reading lists. If I’ve got a unique URL feed in my OPML, then when somebody imports my curated collection of feeds, they end up subscribing to a personalized feed, and now you’ve got multiple people subscribed to a personal feed. The stats are no longer accurate for the publisher and my OPML friend is now getting “Dear Brad:” stuff.
- Once anybody subscribes to the feed in a web-based aggregator like NewsGator Online or My Yahoo, when people search for that topic, they’ll find one or more personal feeds, subscribe to it, and now you have N people subscribed to a personal feed, the publisher thinks all the subscribers are coming from that one person, you’ve lost an accurate count of the number of subscribers. In addition, the new subscribers get the original personalized feed, which may not be configured the way they want (or thought it was). Finally, in some cases, the search will turn up numerous feeds that cover the same topic, making it hard to determine which one should be subscribed to.
- If you are the publisher and you eventually want to change the way you distribute feeds, it’s no longer a matter of redirecting one URL, you now have to go herd the countless subscribers to countless URLs out there in the wild.
Fundamentally, the approach that I’m starting to see appear results in a false sense of a true subscriber count via personalization (presumably one of the goals of personalization is to get an accurate subscriber count), doesn’t scale for the aggregators, the subscriber count quickly diverges from reality as people search for or share feeds, and it’s hard to redirect your subscribers correctly if you decide to do something different later.
There’s got to be a better approach.
In my last post in the Business Plan series, you discovered that my first company was a “software company.” What kind? Even in 1987, the software industry covered a wide variety of things. As I mentioned before, the Introduction section of a business plan should start with a general overview (“microcomputer software”) and then get specific (“database stuff”). By the end of this section, the market segment you are targeting (in this case – “semi-custom software”) should be defined.
The early days of microcomputing were plagued by a lack of software. The hardware component of the system evolved the quickest; software developers were always playing catch-up. Support tools for software companies were limited forcing the programmer to invest a significant amount of time in developing toolboxes for use on a particular machine. As a result, software developers often sacrifice the flexibility of the applications they were developing in order to simply get them to work. This changed with the advent of the IBM PC which prompted an epidemic of fourth-generation application development languages, more commonly known as database languages. For the first time, the microcomputer software developer could focus on the issues underlying his application instead of spending all his time trying to implement the application on a particular piece of hardware.
Today, database languages are beginning to replace conventional languages within the context of application development. The trend has reached a critical mass; new database languages are emerging weekly. This outbreak of products has served to legitimize a new approach to application development.
These database programming languages can serve as a basis for a new type of application development – semi-custom software. Semi-custom software is a divergence from existing software product offerings. In a semi-custom environment, the benefits of mass-produced systems and custom software are combined. A reusable “shell” is designed for a specific industry – this is the systems component. Instead of packaging and selling only the shell, a semi-custom company modifies it to fit the client’s specific needs. Some custom software is written and integrated with mass-produced software. As a result, the customer essentially gets a software product that fits his precise needs (emulating a custom product) at a systems software price.
In 1987, “semi-customer software” was a new concept. Database languages (or 4GLs) were becoming popular (remember dBase and .dbf files?) and were starting to incorporate mainstream programming capabilities (most notably procedural abstraction – a big deal at the time). The things we now call “packaged applications” were going by pre-ERP TLAs such as MRP, CAD, CAM, and POS (“point-of-sale”, although most were about as good as the other use of the acronym). We felt like there must be something in between a custom application and shrink wrapped software and decided to try to coin the phrase “semi-custom software.” While this phrase didn’t stick, the concept ended up being very relevant and foreshadowed the packaged software revolution.