It’s An Agile World

My post on How to Fix Obamacare generated plenty of feedback – some public and some via email. One of the emails reinforced the challenge of “traditional software development” vs. the new generation of “Agile software development.” I started experiencing, and understanding, agile in 2004 when I made an investment in Rally Software. At the time it was an idea in Ryan Martens brain; today it is a public company valued around $600 million, employing around 400 people, and pacing the world of agile software development.

The email I received described the challenge of a large organization when confronted with the kind of legacy systems – and traditional software development processes – that Obamacare is saddled with. The solution – an agile one – just reinforces the power of “throw it away and start over” as an approach in these situations. Enjoy the story and contemplate whether it applies to your organization.

I just read your post on Fixing the Obamacare site.

It reminds me of my current project at my day job. The backend infrastructure that handles all the Internet connectivity and services for a world-wide distributed technology that was built by a team of 150 engineers overseas. The infrastructure is extremely unreliable and since there’s no good auditability of the services, no one can say for sure, but estimates vary from a 5% to 25% failure rate of all jobs through the system. For three years management has been trying to fix the problem, and the fix is always “just around the corner”. It’s broken at every level, from the week-long deployment processes, the 50% failure rate for deploys, and the inability to scale the service.

I’ve been arguing for years to rebuild it from scratch using modern processes (agile), modern architecture (decoupled web services), and modern technology (rails), and everyone has said “it’s impossible and it’ll cost too much.”

I finally convinced my manager to give me and one other engineer two months to work on a rearchitecture effort in secret, even though our group has nothing to do with the actual web services.

Starting from basic use cases, we architected a new, decoupled system from scratch, and chose one component to implement from scratch. It corresponds roughly to 1/6 of the existing system.

In two months we were able to build a new service that:

  • scales to 3x the load with 1/4 the servers
  • operates at seven 9s reliability
  • deploys in 30 seconds
  • implemented with 2 engineers compared to an estimated 25 for the old system

Suddenly the impossible is not just possible, it’s the best path forward. We have management buy-in, and they want to do the same for the rest of the services.

But no amount of talking would have convinced them after three years of being entrenched in the same old ways of doing things. We just had to go build it to prove our point.

  • Dil-Dominé Jacobe Leonares

    It’s so sad we have a government that pays so much money to Government Contractors to develop a faulty website. These contractors are just pocketing the money, they should be paid based on deliverables just like any vendor. I’ve been trying to log into for the past month – it’s always down. They’re always doing maintenance Fridays – Sunday (no one is around so turns off the website). But I honestly feel like we can’t do anything, wish we could clear both houses and instill term limits.

    • We have term limits. They’re called elections. If people want their rep out so badly, vote them out. Of course, what happens is that people fall back to “Congress SUCKS… my rep is pretty good though.”

      Brad – one thing to note is that the 2 guys built 1/6 of the system in 2 months, not the whole thing. Still a good demonstration of the power not only of agile as a process but of modern tech.

      As I’ve understood this, though, it wasn’t so much the team building the site as the management and shifting regulatory and technical requirements. IF memory serves, the team building the site had hardware and software requirements changed 6 or 7 times. That cripples any effort to build a reliable site under time pressure. Your correspondent didn’t have that happen since it was under the radar. Take that public within the company and you have to clamp down on all of the manager types who want this changed and that updated and why are you using Rails, you need to use PHP because Corp IT says so and…

  • I’m still scratching my head over the Oregon exchange (Cover Oregon) which is still not fully operational. The press has reported payments to Oracle of $43.2M + another $28.5M expected this year, and apparently they kicked things off in 2011 to the tune of $45.9M. I find all of this difficult to reconcile with Oracle’s Agile Product Lifecycle Management.

  • This was a tipping point moment that exemplified the clash of old vs. modern software development. If this doesn’t tip every big old IT department on the planet to rethink their development practices and vendor alliances, I’m not sure what will.

    • Duh software

      It’s most interesting that people think the was waterfall. The description of dev sounds a lot more agile, no fixed service requirements for example – they changed at least 4 times. Agile is the worst choice for services, fine for gui tweaking. Services have to be rock solid or you get the mess that is a lot of enterprise projects. RoR is fine for your dept suggestions app, it won’t scale to 10s of thousands of transactions per minute. Use the right approach for what you’re building or all you’re doing is masking the failure point just a little longer.

      • “Agile is the worst choice for services” – I can’t agree with that.

        • Duh

          You don’t have to. I’ve seen the results of agile SOA failures more than once or twice, and had to clean them up.

          • So what. At least with agile, you can see and correct things faster. That’s the key benefit.

          • Duh

            I guess that’s why so few agile projects fail after running far over budget and schedules? If your response to that is “they weren’t doing agile right” my question will be point me to the “right” way.

      • williamhertling

        I’m doing tens of thousands of transactions per minute on Rails on an agilely developed service. 🙂

        • Duh 2

          I doubt that seriously, unless you’re counting pure web hits as transactions, or are running 100+ servers. I ran 20k+ db based logins per min across about 30 servers. This was a while ago, current hardware would be less than 10.

  • Appeos

    That story reminds me of so many of the IT development projects I’ve worked on over the years (and decades).

    It became so bad that eventually I thought “there must be a better way” for the thousandth time, but this time I had a lightbulb moment and realised that there was. This was really the genesis of my latest startup. People say a startup should address a pain point and after so many IT projects, I’d seen plenty of pain, mostly on the faces of the executives who spent millions on new systems, then were told that it’d be years late, cost way more than they were promised and only do half what they expected. And of course, it would need to be replaced as soon as it was deployed, as it was so far delayed that it was now out of date, so they’d have to go round the loop again.

    I too find it hard to understand why governments are so keen to reward total failure with hundred of millions of dollars. Too many times, the same companies get hired, charge enormous sums, fail totally, then get paid again to fix the mess that they created. I dread to think how many billions of taxpayers money goes down the pan every year.

    Anyway, I could write about this for 100,000 words. In fact, I was going to write a book along the lines of The Phoenix Project, but although that would be amusing, I thought it better to build a company that addresses these issues in a meaningful way. My test sites prove the concept, now I just need to make it widely available. Then maybe we can see our tax dollars going to worthwhile things, not wasted, and see companies who adopt the new approach flourish and leap ahead of their competitors. Watch this space…

    • Steve Hayes

      Dear Appeos
      A gentle hit to somebody newly in business for themselves. When making a persuasive plug it’s good to put your company name in the text, or at least enough to be googled with.

      • Hi Steve, no problem. It wasn’t meant to be a pitch and so I didn’t include contact details. Anyone motivated can find me pretty easily, so that’s not something I’m worried about right now.


    The systems built today on open source technologies offer a back door for the hackers who will plug every hole in the system. 300 Million clients with many touch point for the records they produce will be a daunting task for any firm. I wish them well.

  • Ryan Martens

    Thanks for sharing the story. There a four big things at play here:
    1. Old technology
    2. Bad test/dev/ops environment
    3. Large teams working on a monolithic architecture, hard to keep from running into each other and issolating errors
    4. Waterfall thinking that is designed to implement a plan, with high developer utilization, low variance and stage gates

    I think the person’s story is one of fixing all these at once with a small team. I am NOT surprised to hear that they got 1/6th done in two months with two folks. In other contexts, you might not have all of these problems at such an extreme level and the decision to rewrite gets tougher. In either case, the answer to the problem is more agility. Adding more agility to your deployment process, your architecture, your development process and your teams will always deliver good things, no matter if you rewrite or not. Good agile teams tend to run with slack that allows for constant hacking to make things better. This mode of continuous improvement is the only way to avoid these kind of situations in the long term; and they work well when you build your solution from a steel thread that is 1/6 to 6/6 of the capabilities.

    Thanks for your insights.


  • Vince Vega

    I can’t stop laughing at Geddy Lee with his mullet trying to Monday Morning quarterback the site. Like all Agile cultists, the answer is always “Agile” and Ruby On Rails. I bet you voted for Obama, too. Thanks for causing 15 million people to lose their health insurance this year (“If you like your insurance, you can keep it. Period.”) and face much higher premiums and 90 million to lose their health insurance next year. It is time to Repeal Obamacare and Impeach Your Boy Wonder.

  • tttttt

    When I worked on a giant waterfall project , I always wondered why do we code as if the requirements will not move in the next two years.
    When I read about agile projects, I wonder how they can assume no system will require front end design and years to do. For example you really cant iteratively deploy an airplane, crashing each one until it passes the final test. But yes you can deploy the flight controls on a test plane first.

  • I had a similar experience. I was contracting for a company that had spent 10 years developing their software in PHP. They weren’t interested in even entertaining the idea of building their latest product in a more productive language or framework, thinking they’d heard it all before. They had no reason to trust what I said over their own instincts. Fair enough, I couldn’t blame them for that.

    One weekend I came into the office and built a significant part of their latest app in Rails, in two days. On day two I invited one of the designers to join me. By Monday morning we had a working demo. Showing the company founders what we’d done was surprisingly nerve wracking, but they saw first hand how technology had moved on. On Monday afternoon they asked me to lead a Rails development group.

    Show. Don’t tell.

    A year later one of the founders told me over a beer that the only way I could have helped them see the light was to show them. He freely admitted I’d never have been able to talk them round, and really appreciated me taking the time to do it.