Saturday, April 7, 2018

Why Log Incidents

Incident logging is critical for IT Service Management for the following reasons:
  1. The first and perhaps most obvious reasons is you want to know if there are pending incidents that need to be resolved. When supporting a service, you want to make sure that everything is working as it should. You would not want to miss resolving an incident that would affect say a major service used by your company.
  2. The second reason is to make sure someone is assigned to it. This is important specially for critical incidents because you want someone always on the ball. You would not want to go into an incident war room and not be able to know the status of an incident. For critical incidents, communication is extremely critical and knowing the status of an incident, the expected resolution time will certainly help calm down nervous managers.
  3. The third reason is you want to have some metrics on measuring performance. A log of incidents will tell you a lot. 
    • It tells you which component is stable and which is not. If you have an incident happening to a component say every day, then there is a problem that needs to be resolved.
    • It can give you an idea of how fast an incident was closed. Which can lead you to look at possible problem areas in incident resolution. If say an incident took a lot more time to close than others, then it could indicate areas for improvement for that kind of incident.
    • If you have a history of who was assigned at specific points in resolving the incident, it will tell you performance of each member. If one member always takes longer to resolve a similar incident, this could indicate some performance issues that could be addressed either by training or other intervention activities.
    • By using the information on the incident log, you can identify trends that could lead to improved performance. If most incidents happen at specific period across several days, this could indicate that the problem could be wider than was expected.

What Do You Log?

This raises the question of what should be logged in an incident ticket. The following are some major items to be included in the incident log. Items are not limited to these: 
  1. Date and time when the incident was reported and closed.
  2. Short description of the incident as a title. This must be descriptive enough to give a summary of the incident. A title like "Network outage' is not as clear as 'Network outage due to firewall A failure'.
  3. Initial priority to be assigned to the incident. This will help determine how critical the incident is. The priority of an incident may change as it goes through its life cycle. An incident may have been identified as critical like affecting the entire company but after verification, it may turn out that only an isolated number of users are affected.
  4. Name of the person who reported the incident and the name of the person who created the ticket. Getting these information allows you to get more information if needed.
  5. Longer description of the incident. Here, you may put details of the incident such as error logs, and error codes. The more relevant information, the better it is for the one resolving it.
  6. Person assigned to the incident. This is critical so you will know who to contact to follow up on the status.
  7. Status of the incident - whether it is open, under investigation, resolution being developed, resolution being deployed, resolution successfully deployed, closed. This gives people a quick glance at the status of the incident.
  8. Resolution is very critical because it will help in future incidents. This is one of the main purpose of keeping an incident log. If the resolution is written clearly and in detail, it will help reduce the time for similar future incidents. I cannot say how much time and effort was saved by being able to refer to this.
  9. Attachments could be emails regarding the incident such as technical discussions, minutes of meetings, decisions made, approvals, etc. These may prove helpful for documenting the incident and for audit purposes.