This article has been contributed by Simon Higginson.
Problem Management is the intriguing discipline of the Service Management suite. The IT Department is continually being asked to be proactive not reactive.
Often in IT we presuppose what our customers in the business require, then give them a solution to issues that they didn’t know that they had. But what happens when that business customer is asking IT for a permanent solution to an issue we might not have known that we had, or to an issue where we know only a sticking plaster fix is in place?
Your Problem Manager is the key
Step up to the plate the Problem Manager, the individual focussed on reacting to, and managing, issues that have already happened. They can’t really help but have a reactive mindset, rooted in the analysis of fact. The incident might be closed but the Problem Manager is the person entrusted with ensuring that appropriate steps are taken to guarantee the incident doesn’t repeat itself. It can be a stressful role, the systems were down, the company perhaps lost, and may still be, losing money, trading has been impacted. People want to know what is being done. So what SLAs can be put in place between the Problem Manager and the service owner to support the Problem Manager’s activities and maybe give them breathing space, whilst at the same time ensuring that there is some focus on resolution?
Lets look at the four problem management SLAs that you really can’t live without
#1 – Provision of Problem Management reference number
A simple SLA to get you started. This is simply an acknowledgement by the problem management team that the problem has been logged, referenced and is in the workflow of the team. It provides reassurance that the problem is going to be dealt with.
#2 – Time to get to the root cause of the issue
So this is where some breathing space is provided. The message being given in this particular SLA is that there is a distinction between incident management and problem management. Incident management has resulted in a temporary fix to an issue, now it is the turn of problem management to actually work out what lay at the heart of the matter – what was the root cause.
Note this is an SLA about identifying and not resolving the root cause – that could take a significant time period involving redevelopment of code.
The outcome that is being measured by the SLA is going to be the production of a deliverable, perhaps in the form of a brief document or even just an email that highlights the results of the root cause analysis. Each company will have to determine its own policy of what that deliverable might contain, but the SLA is there to measure the time between the formal closure of the incident and the formal provisioning time of problem management’s root cause analysis deliverable.
#3 – Measurement of provision of Root Cause Analysis documentation. To be provided within X working days of initial notification.
So, you’ve acknowledged receipt of the problem, and you’ve determined the root cause. The next SLA is in place to ensure that a formal document is delivered in a timely fashion. It should have a set format and set down the timeline of events that caused the problem, and actions that have been taken to provide a workaround. It should then list all of the actions and recommendations together with clearly identified owners that need to be completed by realistic dates in order to fix the problem. A suggested target date would be 3 days for simple problems and 5 and 10 days for increasingly more complex ones.
#4 – Measurement of progress on root cause analysis actions as agreed (Target dates not to change more than twice)
In the previous SLA we have measured the time to produce the root cause analysis. This SLA takes over where the previous clock stopped.
The root cause analysis work will have identified actions that need to be undertaken and implemented to affect a permanent fix to the original issue and allow the sticky plaster solution to be superseded.
However, all resolutions will not be equal in complexity, effort and duration, therefore there will be an initial estimation of a target date for live implementation of a permanent fix. Moving the target completion date is allowed, however this SLA limits how often this can occur to prevent action timescales drifting.