Wednesday, June 02, 2010

The Zen of Debugging

There was this senior engineer who once used to sit in the cubicle next to mine. He had an anecdote pasted on the wall of his cubicle. His purpose in putting it there was to educate newbies like me. Thought I’d share the wisdom. Story goes like this.

General Motors once received a complaint from a customer who claimed that his new Pontiac was allergic to vanilla ice-cream. The customer wrote, “This is the second time I have written you, and I don't blame you for not answering me, because I kind of sounded crazy, but it is a fact that we have a tradition in our family of ice cream for dessert after dinner each night. But the kind of ice cream varies so, every night, after we've eaten the whole family votes on which kind of ice cream we should have and I drive down to the store to get it. It's also a fact that I recently purchased a new Pontiac and since then my trips to the store have created a problem. You see, every time I buy vanilla ice cream, when I start back from the store my car won't start. If I get any other kind of ice cream, the car starts just fine.” So GM sent an engineer from the product team to check it out.

The engineer was surprised to be greeted by a successful, obviously well-educated man in a fine neighbourhood. He had arranged to meet the man just after dinner time, so the two hopped into the car and drove to the ice cream store. It was vanilla ice cream that night and, sure enough, after they came back to the car, it wouldn't start. The engineer returned for three more nights. The first night, the man got chocolate. The car started. The second night, he got strawberry. The car started. The third night he ordered vanilla. The car again failed to start.

All the while the engineer was taking down all sorts of data - time of day, type of gas used, time spent at the store before restarting etc. He noticed a correlation - on the days when the customer bought vanilla ice-cream and the car failed to start, the time spent at the store was much less. So the engineer rephrased the problem – why is it that the car will not restarted within a short period of time after the engine is turned off?

Once the problem was defined in terms of restart interval, and not ice cream flavour, the experienced engineer immediately realized vapour lock was causing the issue. (The engineer later learned that Vanilla, being the most popular flavour, was in a separate case at the front of the store for quick pick up. All the other flavours were kept in the back of the store at a different counter where it took considerably longer to find the flavour and get checked out.)

So how did this engineer analyse the root cause of a seemingly impossible bug? To start with, he did not dismiss the issue as being impossible or the customer as being a jerk. He diligently took data without jumping to conclusions or offering a quick fix work around. He knew what parameters to look for. He saw a pattern in the data. Finally he defined the problem – restart interval. The rest was trivial.

Defining the problem, asking the right question, that is the key to problem solving, the Zen of Debugging.

PS: In my first year at work, I remember being in the office at about 3am trying to debug a particularly tricky issue that had been around for a week and was threatening to kill my maiden feature. That’s when I first noticed this anecdote pasted in my neighbours cubicle. It helped me to step back, relax, see the facts and rephrase the problem. The issue was fixed before sunrise. Hope others find this useful too.