« swipe left for tags/categories
swipe right to go back »
As a user, how often have you thought “I wish this web service was faster.” As a CEO, how often have you said “just make it faster.” Or, more simply, “why is this damn thing so slow?”
This is a not a new question. I’ve been thinking about this since I first started writing code (APL) when I was 12 (ahem – 33 years ago) on a computer in the basement of a Frito-Lay data center in Dallas.
This morning, as part of my daily information routine, I came across a brilliant article by Carlos Bueno, an engineer at Facebook, titled “The Full Stack, Part 1.” In it, he starts by defining a “full-stack programmer“:
“A “full-stack programmer” is a generalist, someone who can create a non-trivial application by themselves. People who develop broad skills also tend to develop a good mental model of how different layers of a system behave. This turns out to be especially valuable for performance & optimization work.”
He then dissects a simple SQL query (DELETE FROM some_table WHERE id = 1234;) and gives several quick reasons why performance could vary widely when this query is executed.
It reminded me of a client situation from my first company, Feld Technologies. We were working on a logistics project with a management consulting firm for one of the largest retail companies in the world. The folks from the management consulting firm did all the design and analysis; we wrote the code to work with the massive databases that supported this. This was in the early 1990′s and we were working with Oracle on the PC (not a pretty thing, but required by this project for some reason.) The database was coming from a mainframe and by PC-standards was enormous (although it would probably be considered tiny today.)
At this point Feld Technologies was about ten people and, while I still wrote some code, I wasn’t doing anything on this particular project other than helping at the management consulting level (e.g. I’d dress up in a suit and go with the management consultants to the client and participate in meetings.) One of our software engineers wrote all the code. He did a nice job of synthesizing the requirements, wrestling Oracle for the PC to the ground (on a Novell network), and getting all the PL/SQL stuff working.
We had one big problem. It took 24 hours to run a single analysis. Now, there was no real time requirement for this project – we might have gotten away with it if it took eight hours as we could just run them over night. But it didn’t work for the management consultants or the client to hear “ok – we just pressed go – call us at this time tomorrow and we’ll tell you what happened.” This was especially painful once we gave the system to the end client whose internal analyst would run the system, wait 24 hours, tell us the analysis didn’t look right, and bitch loudly to his boss who was a senior VP at the retailer and paid our bills.
I recall having a very stressful month. After a week of this (where we probably got two analyses done because of the time it took to iterate on the changes requested by the client for the app) I decided to spend some time with our engineer who was working on it. I didn’t know anything about Oracle as I’d never done anything with it as a developer, but I understood relational databases extremely well from my previous work with Btrieve and Dataflex. And, looking back, I met the definition of a full-stack programmer all the way down to the hardware level (at the time I was the guy in our company that fixed the file servers when they crashed with our friendly neighborhood parity error or Netware device driver fail to load errors.)
Over the course of a few days, we managed to cut the run time down to under ten minutes. My partner Dave Jilk, also a full-stack programmer (and a much better one than me), helped immensely as he completely grokked relational database theory. When all was said and done, a faster hard drive, more memory, a few indexes that were missing, restructuring of several of the SELECT statements buried deep in the application, and a minor restructure of the database was all that was required to boost the performance by 100x.
When I reflect on all of this, I realize how important it is to have a few full-stack programmers on the team. Sometimes it’s the CTO, sometimes it the VP of Engineering, sometimes it’s just someone in the guts of the engineering organization. When I think of the companies I’ve worked with recently that are dealing with massive scale and have to be obsessed with performance, such as Zynga, Gist, Cloud Engines, and SendGrid I can identify the person early in the life of the company that played the key role. And, when I think of companies that did magic stuff like Postini and FeedBurner at massive scale, I know exactly who that full system programmer was.
If you are a CEO of a startup, do you know who the full-stack programmer on your team is?
About a month ago I wrote a post titled Trying Gmail For A Week. I haven’t thought about Outlook, Entourage, or Mac Mail for a month and I don’t think I’m ever going back. It took about a week to rewire my brain for how conversations worked and what the keyboard shortcuts were, but not that I’m there it’s just awesome.
A few weeks ago Fred Wilson wrote a post titled Inbox Zero. In it he mentioned two Gmail services he found indispensable – Priority Inbox (from Google) and Unsubscribe.com (from James Siminoff who created Phonetag, another great service.) I agree with Fred on both of these, but have discovered a few extra things that are killer. I’ll list them below and for balance talk about a few shortcomings.
Priority Inbox: I’ve seen numerous tweets and blogs about how Priority Inbox doesn’t really do much. These are wrong / misinformed reactions. The trick to Priority Inbox, like many other things, is to actually use it for a few weeks. Part of using it is training it by quickly marking things up to “important” (by clicking +) or down to “everything else” (by clicking -). A small configuration change can make Starred emails (for quick follow up) a different category. I found that it only took about three days of this before I saw benefit and now (a month later) Priority Inbox gets it right 99 out of 100 times. I get over 500 emails a day – there is a long list of them that fall in “Everything Else”. I used to have to check / clear email obsessively throughout the day to stay at Inbox Zero. With Priority Inbox I’m finding solid email stretches a couple of times during the day are more than enough for me to stay on top of everything.
Unsubscribe.com: Like many people, I’m stuck in the endless “unsubscribe from email lists” infinite loop. I get vigilant for a few days and do the annoying unsubscribe drill one by one and knock a few off the list, but within a few weeks I’ve got even more. I’ve never seemed to be able to eliminate all the stuff I don’t want, especially around an election when it all escalates like crazy. With Unsubscribe.com, I simply click the Unsubscribe button in Gmail and the service gets rid of it. Don’t bother with the trial – trust me and just pay $19 for the service for a year if endless mailing list email that you don’t want is a problem for you.
Google Voice: I’ve had a Google Voice for a long time but I never fully switched over to it. The Google Voice integration with Gmail has tipped me over. I’ve been dreaming about getting rid of my desktop phone for a while – I now find myself almost exclusively doing every call from my computer except when I’m not online (where I have to use my cell phone.) More importantly, video chat and text chat is completely integrated within Gmail so from one screen I have email, my phone (inbound and outbound calling) Skype-equivalent video chat, and text chat. While I still use Skype extensively (I’m bradfeld) I find I’m using it much less as I end up using email@example.com instead.
Gist: I’m an investor in Gist and use it for my unified contact manager. Google Contacts is ok, but has a long way to go. But Gist integration with Gmail at a data level is superb. I’m still using Gmail’s consumer service so the integration is primarily at a data level, but I’m now playing around with a full switch over to Google Apps and the Gist + Google Apps integration (via the Google Apps Marketplace) just rocks. In addition, there’s a new browser-based Gist add-on coming out shortly (hint hint) that will provide direct integration into the consumer version of Gmail.
GooTasks: Since I am an Inbox Zero guy, I don’t keep anything (including paper), but I do have a short task lists of things like blog posts I’m going to write. I went through an Evernote phase recently but it’s overkill for me. Google Tasks is perfect, but I didn’t have an obvious way to sync with my iPhone. Now I do.
There are a handful of annoying things. The biggest one is that I have multiple accounts on Google (firstname.lastname@example.org as well as email@example.com) and they aren’t tightly integrated across all services. The other is the weak / inconsistent iPhone integration which keeps pushing me toward using an Android phone full time (I’m now carrying both an Android phone and an iPhone.) My dad’s recent story on the Samsung Fascinate has me seriously considering a full time switch over to Android.
My “while I’m working” migration from a full Windows / Outlook / Exchange / Office world to an almost completely non-Microsoft world has been fascinating. I’m in Seattle next week including a 24 hour stretch at Microsoft for some stuff – maybe it’ll come up and be an interesting discussion that my friends at Microsoft can learn from. In the mean time, I think the next big switch will be an organization one completely over to a Google Apps infrastructure.
Every major software or web company I’ve ever been involved in has had a catastrophic outage of some sort. I view it as a rite of passage – when this happens when your company is young and no one notices, it gives you a chance to get better. But eventually you’ll have one when you are big enough for people to notice. How you handle it and what you learn from speaks volumes about your future.
Last week, two companies that we are investors in had shitty experiences. SendGrid‘s was short – it only lasted a few hours – and was quickly diagnosed. BigDoor‘s was longer and took several days to repair and get things back to a stable state. Both companies handled their problems with grace and transparency – announcing that all was back to normal with a blog post describing in detail what happened.
While you never ever want something like this to happen, it’s inevitable. I’m very proud of how both BigDoor and SendGrid handled their respective outages and know that they’ve each learned a lot – both in how to communicate about what happened as well as insuring that this particular type of outage won’t happen again.
In both cases, they ended up with 100% system recovery. In addition, each company took responsibility for the problem and didn’t shift the blame to a particular person. I’m especially impressed how my friends at BigDoor processed this as the root cause of the problem was caused by a new employee. They explain this in detail in their post and end with the following:
“Yes, this employee is still with us, and here’s why: when exceptions like this occur, what’s important is how we react to the crisis, accountability, and how hard we drive to quickly resolve things in the best way possible for our customers. I’m incredibly impressed with how this individual reacted throughout, and my theory is that they’ll become one of our legendary stars in years to come.”
I still remember the first time I was ever involved in a catastrophic data loss. I was 17 and working at Petcom, my first real programming job. It was late on a Friday night and I got a call from a Petcom customer. I was the only person around so I answered the phone. The person was panicked – their hard drive had lost all of its data (it was an Apple III ProFile hard drive – probably 5 MB). The person was the accounting manager and they were trying to run some process but couldn’t get anything to work. I remember discerning that it seemed like the hard drive was fine but she had deleted all of her data. Fortunately, Petcom was obsessive about backups and made all of their clients buy a tape drive – in this case, one from Tallgrass (I vaguely remember that they were in Overland Park, KS – I can’t figure out why I remember that.)
After determining the tape drive software was working and was available, I started walking the person through restoring her data. She was talking out loud as she brought up the tape drive menu and starting clicking on keys before I had a chance to say anything at which point she pressed the key to format the tape that was in the drive. I sat in shock for a second and asked her if she had another backup tape. She told me that she didn’t – this was the only one she ever used. I asked her what it said on the screen. She said something like “formatting tape.” I asked again if there was another backup tape. Nope. I told her that I thought she had just overwritten her only backup. Now, in addition to having deleted all of her data, she had wiped out her backup. We spent a little more time trying to figure this out, at which point she started crying. I doubt she realized she was talking to a 17 year old. She eventually calmed down but neither of knew what to do next. Eventually the call ended and I went into the bathroom and threw up.
I eventually got in touch with the owner of Petcom (Chris) at his house who told me to go home and not to worry about it, they’d figure it out over the weekend. I can’t remember the resolution, but I think Chris had a backup for the client from the previous month so they only lost a month or so worth of data. But that evening made an incredible impression on me. Yes, I finished the evening with at least one illegal drink (since the drinking age at the time in Texas was 18.)
It’s 28 years later and computers still crash, backups are still not 100% failsafe, and the stress of massive system failure still causes people to go in the bathroom and throw up. It’s just part of how this works. So, before you end up in pain, I encourage you to think hard about your existing backup, failover, and disaster recovery approaches. And, when the unexpected, not anticipated, not accounted for thing happens, make sure you communicate continually and clearly what is going on, no matter how painful it might be.
Ever since I switched to the Mac, I’ve had N (where N is a suitably large number) tell me that I should switch to Gmail from Exchange. I finally decided to try it for a week and see if it works for me. Given my Mac experience – where I had to commit and really use it, I’ve decided to do the same on Gmail.
For now, I’m just going to use Gmail (instead of Google Apps) because I don’t want to go through the hell of switching the feld.com domain since I’ve got a bunch of other people (e.g. my family members) on it in a variety of configurations. That’ll limit me a little as I won’t be able to use the Apps Marketplace, but the benefit is I’ll be able to mess around with a variety of other Gmail stuff.
If you’ve got Gmail addons, hints, tips, and trick, leave them for me here. At the end of next week, I’ll either be switching to Gmail or heading back to Mac Mail against my Exchange server.
This morning, as I was waiting for my laptop to grind through its startup process I started wondering why I had a laptop. I travel a lot and had it with me in San Francisco and Los Angeles this week, but hardly used it. And, when I did, I was frustrated with how long I had to wait for it to “get started”.
Today, while I was waiting for my laptop to sync email (Outlook 2010) I grabbed my iPad, opened mail, and read/reply/deleted all of the email that came in over night. I was finished processing the email before my laptop was ready to be used.
I had this same experience yesterday morning in LA. Except then I processed all of my overnight email on my HTC EVO phone which was also acting as the hotspot for my laptop to connect. And, throughout the day, I just did email on my phone instead of firing up my laptop.
The only time I used my laptop last week was a three+ hour stretch in San Francisco when I was at First Round Capital’s office (thanks Josh for the use of your desk) in between meetings. I had turned on my laptop at 8:45am when I got to FRC’s office, did a board meeting from 9am to 12 (the laptop was in a different room), and then used my laptop from noon until I left around 3:30. By noon it had fully synched itself.
As I write this, I realize that Android and Apple both sync faster with my email on an Exchange data store than my Windows 7 laptop with Outlook. A lot faster. It doesn’t seem to matter whether I’m connecting over 3G or Wifi – my Android phone, iPad, and iPhone are ready to go right away whereas my laptop takes anywhere from 5 to 15 minutes to get into a fully usable state (where the disk doesn’t spin an slow things down, or Outlook is non-responsive, or something else funky is going on.) I’m on a Lenovo X300 with 4GB of RAM so it’s not the hardware.
I wrote this post on my iPad using the cute little iPad keyboard doc. It appears my laptop is once again useable, but it’s probably too late for me this morning. Time for a run.