By James Kwak
Yesterday the Obama administration announced that healthcare.gov “will work smoothly for the vast majority of users.” Presumably they intended this as some sort of victory announcement after their self-imposed deadline of December 1 to fix the many problems uncovered when the site went live two months ago. But anyone who knows anything about software knows that it’s not enough to “work smoothly” for the “vast majority” of users.
Apparently pages are now loading incorrectly less than 1 percent of the time. Well, how much less? Pages failing 1 percent of the time make for a terrible web experience, especially for a web site where you have to travel through a long sequence of pages. There is evident fear that the current site will not be able to handle any type of significant load, like it will get around the deadline to sign up for policies beginning on January 1. And we know that “the back office systems, the accounting systems, [and] the payment systems”—in other words, the hard stuff—are still a work in progress.
None of this should come as any surprise—except to the politicians, bureaucrats, and campaign officials who run healthcare.gov. The single biggest mistake in the software business is thinking that if you throw resources at a problem and work really, really hard and put lots of pressure on people, you can complete a project by some arbitrary date (like December 1). It’s not like staying up all night to write a paper in college. This isn’t just a mistake made by people like the president of the United States. It’s made routinely by people in the software business, whether CEOs of software companies who made their way up through the sales ranks, or CIOs of big companies who made their way up as middle managers. You can’t double the number of people and cut the time in half. And just saying something is really, really important won’t make it go any faster or better.
Clearly all sorts of things were wrong before October 1 (and not just because they were relying on Oracle to do something other than supply a database). According to the Times, the website “had barely been tested before it went live,” which is a sure recipe for disaster. Back in my day, every feature was supposed to be finished three months before release. I know web companies do things differently today, but when it comes to performance they already know they can handle the load, and I doubt they cut corners when it comes to software that handles financial transactions. If you don’t have time to test, you shouldn’t ship. It’s that simple. Anything else is just wishful thinking.
It seems like healthcare.gov had at least two huge problems at launch. The first was performance—the ability of the system to deliver pages quickly when under load. I don’t have any insider information, but from the outside it sounds like a lot of what they are doing is switching hardware around, increasing the bandwidth at certain key chokepoints, and firing their hosting company. That’s all good, but performance is only secondarily a hardware issue. The software has to be designed properly to be scalable—so that adding twice the hardware will allow it to support twice as many users. If not, you need to scrap it and start from scratch. I can’t tell from the outside (and I couldn’t even tell from the inside) if it’s designed properly in this case, but I sure hope so, because otherwise no amount of hardware shuffling will do the trick.
The other problem was data integrity. When you’re dealing with financial transactions, it’s really, really important that the data don’t get messed up between the two counterparties. But it seems like, at a minimum, customer records weren’t making it through to the insurers. It sounds like fixing that mess has been deferred until later. It could be as simple a problem as bad data mapping between one data model and another. But fixing these problems involves another software quality issue. With high-quality code, it’s relatively easy to find and fix these errors. With bad code, it’s hard to find bugs and it’s harder to fix them without destabilizing the rest of the system. Again, let’s hope for the former.
I’m not technically skilled enough to be the type of person you would want making decisions about this mess, and I don’t know anything more than you can read in the newspaper. But when custom software projects go this badly, I think that in general (meaning more than half the time) you are better off cutting your losses and starting over. Obviously there are administrative and political reasons why the Obama administration can’t do that. We know that this project has to succeed like few other projects in history, and it will get there one way or another. But there’s no magic bullet, and neither hope nor trying harder is a viable strategy.