Monday, August 5, 2013

Incident and Problem: What is the Difference

One nice thing about ITIL is it provides definitions for almost everything in IT service management. First things first. We need to define these two terms.
  • An incident is an unplanned disruption or degradation of service.
  • A problem is a cause of one or more incidents.

    Quite often, these two terms are used interchangeably. This causes a lot of confusion. Sometimes people will add another term, "issue" to mean the same thing.

    What is an Incident? 

    Based on the definition provided, an incident is something that needs to be resolved immediately. This can either be through a permanent fix, a workaround or a temporary fix.
    A server crash would be an example of an incident if it causes a disruption in the business process. If a server is used only during office hours, a crash after office hours is, strictly speaking based on the definition, not yet an incident since no service was affected. It becomes an incident only when the outage extends to the hours of use.

    If a disruption is planned like a scheduled maintenance, this is not an incident. The outage should not be counted as part of the unavailability. If the scheduled outage exceeds the planned schedule, then the over time for the outage becomes an incident.

    If an incident requires changes the emergency change process is normally followed, specially if the service level is critical.

    What is a Problem? 

    Problems are not incidents. An incident can raise a problem, specially if there is a high possibility that the incident might happen again. In the case of a server crash after office hours, the crash is a problem. This is a high priority problem because if this problem is not resolved before the scheduled availability, this becomes an incident.

    An incident does not become a problem because they are two different things. A problem may be raised because of an incident and as we've seen in the previous example, a problem may cause an incident. You may raise a problem ticket and link it to an incident.

    The root cause of the problem may be known or not known. In any case, the following actions may be taken for problems:

    1. Do nothing - if the problem does not affect the business, or if the cost of fixing the problem exceeds its benefits
    2. Deploy work around if the determination of root cause exceeds the benefits.
    3. Determine root cause and fix the problem if the benefit is worth it.

    Incident vs Problem: What is the Difference?

    To illustrate this further, let's take a practical example.

    You are driving your car and you got a flat tire. This is an incident because it disrupted the service: transportation to a destination. You fix this by either changing the tire yourself or calling road-side assistance. Once the tire has been changed, the incident is closed. But now, you have a problem, you are running on your spare tire.

    To fix the problem, you need to repair the flat tire and put it back.

    Another example would be that you are driving on an almost bald tire. This is a problem. If you continue to drive your car with that bald tire, you are bound to have an incident.

    Normally, an incident needs to be fixed within a specific timeline. Problems can be left indefinitely until an incident happens.

    Questions to Help Identify Incident and Problem 

    I work in a maintenance shop and quite often, there is much discussion on whether something is an incident or a problem. There is only one question to ask: Should this be fixed now. Of course, when you talk to some people, they will always say yes. So to help me further, I ask the following questions:
    1. Is the service unusable?
    2. Is there a degradation of the service?
    3. Is the business process affected?
    4. Are service levels affected?
    If you answer yes to one of these questions, it is probably an incident.