But the impact is huge and organizations can lose money when these happen. Of course, the goal is to prevent these from happening but we all know that these things do happen. So we need to prepare for them.The most important thing to have is a process to deal with critical incidents. The process should have the following:
- Logging. For critical incidents, you will want to keep a record for several reasons:
- Determine the duration of the incident
- Keep track of the status of the incident
- Keep a record of what was done to remedy the incident. This will be helpful should another similar incident happens.
- Help in determining the root cause analysis during problem management.
(See my article on why you need to log Incidents) - Identify Impact. You need to have well defined criteria to determine whether an incident is critical or not. One way to do this is to base the impact against your SLA. The more impact the incident has on your SLA, the more critical it is.
- Escalation. How will you raise the profile of this incident so it gets the attention at the appropriate level. If the entire organization is impacted, you want upper management to be aware of this. This means you need contact lists on who to call. You need the numbers of the technical lead, the technical manager, the operations or business managers conveniently available so they can be contacted.
- Stakeholder communication. You also need to know how stakeholders are informed and who should inform them. Ultimately, an incident affects your clients. Having a process that clearly defines who will call whom using what method is critical. Of course, contact information of clients should be available. Communication should also consider the possibility of the email system being unavailable. It is useless to send an email to stakeholders if the email system is not available.
- Coordination. Someone needs to be appointed to coordinate the incident as the situation manager. This person will chair the war room and is responsible for coordinating all efforts. With modern technology a 'war room' does not need to be a physical room. You can have a teleconference number that people dial in. This information should be disseminated to everyone. People should be trained on how to use this.
Sample Incident Management Process
The table below is just an illustration of how a procedure should look like. You may use it as a template.
Critical Incident Procedure
| |
User | Informs supervisor. |
Supervisor
|
Determines this is a critical incident. Call Service Desk (555-5555) to report incident. Make sure to tell them this is a critical incident.
|
Service Desk
|
Log incident and initiate conference bridge. 1-800-555-2222 Moderator number: 9999999; participant number: 777777
Send out email to criticalincident distribution list.
If email is not working, send text message to 333-222-4444.
If text messaging system is not working, call numbers listed in the contact list.
|
Technical Lead
|
Contact user to determine the situation and brief technical manager on assessment and ETA on when the incident will be fixed.
|
Technical Manager
|
Work with Technical Lead to assess incident and dial in to conference bridge - Act as Situation Manager
|
Business Manager
|
Assess impact to business and dial in to conference bridge
|
Communication Lead
|
Dial in to conference bridge and determine who to contact and how to contact
|
Technical Lead
|
Works on the incident with technical team
|
Technical Manager
|
Provide regular updates on the situation
|
Sample Contact List
The contact list may look something like this:
Name | Position | Number |
---|---|---|
John Black | Technical Lead | 394-2390 |
Sue Perez | Technical Manager | 394-2391 |
Joseph Young | Business Manager | 394-2392 |
Pauline Lee | Communication Lead | 394-2393 |
Marie Jones | External Client | 230-2034 |