One of the things that I wish was covered in more detail during the ITIL intermediate training is how to properly impact assess Changes. Change Managers are the guardians of the production environment so making sure that all changes are properly assessed and sanity checked is a key part of service delivery. Asses too low and high risk rangers go through unchallenged, assess too high and you block up the process by examining every change no matter how small as if they could kill your organisation.
Here are the things that I look for when assessing a Change:
||Does it highlight the affected services so that it’s easy to identify in any reports?
||Is it clear and does it make sense? Sounds basic I know but let’s make it easier for the other people assessing and authorising the Change.
||Why are we doing the Change? Remember, this isn’t just about technology, what about business and financial benefits?
||What are the risks in carrying out this Change? Has a risk matrix been used to give it a tangible risk score or is it a case of “reboot that critical server in the middle of the day? Be grand”. Imagine explaining to senior management what went wrong if the Change implodes – have you looked at risk mitigation? Using a formal risk categorisation matrix is key here. Don’t just assume technicians know what makes a change low risk. One of the key complaints from the business is that IT does not understand their pain. Creating a change assessment risk matrix IN A REPEATABLE FORMAT should be your first priority as a Change Manager. If you can’t assess the risk of a change in the same way each time, learning from any mistakes, then you’re not doing Change Management. Period.
||Does the proposed timing work with the approved Changes already on the Change Schedule (CS)? Has the Change been clash checked so there are no potential conflicts over services or resources?
||Look at the proposed start and end times. Are they sensible (i.e. not rebooting a business critical server at 9 o’clock on Monday morning)? Does the implementation window leave time for anything going wrong or needing to roll back the Change?
||Are there any special circumstances that need to be considered? I used to work for Virgin Media; we had Change restrictions and freezes on our TV platforms during key times like the Olympics or World Cup to protect our customer’s experience. If you don’t know when your business critical times are then ask! The business will thank you for it.
The Technical Details:
||Have all affected services been identified? What about supporting services? Has someone checked the CMS to ensure all dependencies have been accounted for? Have we referenced the Service Catalogue so that business approvers know what they’re authorising?
|Technical Teams Affected
||Who will support the Change throughout testing and implementation? Will additional support be needed? What about outside support from external suppliers? Has someone checked the contract to ensure any additional costs have been approved?
|User Base Affected
||Check and check again. The last thing you want to do is deploy a Change to the wrong area of the business.
||What do you mean what environments are we covering? Surely the only environment we need to worry out is our production environment right? Let me share the story of my worst day at work, ever. A long time ago and pre-kids, I worked for a large investment bank in London. A so called routine code change to one of the most business critical systems (the market data feed to our trade floors) took longer than expected so instead of updating both the production and DR environments, only the production environment was updated. The implementation team planned on updating the DR environment but got distracted with other operational priorities (i.e. doing the bidding of whichever senior manager shouted the loudest). Fast forward to 6 weeks later, a crisis hits the trading floor, the call is made to invoke DR but we couldn’t because our market data services were out of sync. Cue a hugely stressful 2 hours where the whole IT organisation and its mum desperately scrambled to find a fix and an estimated cost to the business of over $8 million. Moral of the story? If you have a DR environment; keep it in sync with production.
||Are there any licensing implications? Don’t forget, changes in the number of people accessing a system, number of CPUs, or (especially) the way in which people work (moving from dev to prod) have huge impacts on licences.
|Pre Implementation Testing
||How do we make sure the Change will go as planned? Has the Change been properly tested in an appropriate environment? Has the testing been signed off and have all quality requirements been met?
|Post Implementation Verification
||OK; the Change has gone in, how do we make sure everything is as it should be? Is there any smoke testing we can carry out? This is particularly important in transactional services; I once saw a Change that went in, everything looked grand but when customers tried to log in the next day, they couldn’t make any changes in their online banking session. I’ll spare you the details of the very shouty senior management feedback; let’s just say fun was most definitely not had that day. If at all possible; test that everything is working; the last thing you need is a total inability to support usual processes following a Change.
||Does it make sense and does everyone involved know what they are meant to be doing and when. If other teams are involved are they aware and do we have contact details for them? Are there any dodgy areas where we might need check point calls? Do we need additional support in place such as additional on call / shift resource on duty senior manager to mitigate risk? The plan doesn’t have to be fancy; if you need some inspiration I can share some template implementation plans in our members / subscribers area.
|Back Out Plan
||What happens if something goes wrong during the Change? Do we fix on fail or roll back? Are the Change implementers empowered to make a decision or is escalation needed? In that case; are senior management aware of the Change and will a designated manager / decision maker be available? Can the Change be backed out in the agreed implementation window or do we need more time? If it looks like restoration work will cause the Change to overrun; warn the business sooner rather than later so that they can put any mitigation plans / workarounds in place.
|What Early Life Support Is Planned?
||What early Life support is planned? Are floorwalkers needed? Are extra team members needed that day to cope with any questions? Have we got defined exit criteria in place?
|Is The Service Desk Aware?
||Has someone made the Service Desk aware? Have they been given any training if needed? I know it sounds basic but only a couple of months ago; I had to sit down and explain to an engineer why it was a good idea to let the Service Desk know before any Changes went live. Let’s face it; if something goes wrong the Service Desk are going to be at the sharp end of things. And speaking as an ex Service Desk manager (a very long time ago when they were still called Help Desks) there is nothing worse than having to deal with customers suffering from the fallout of a Change that you know nothing about.
||Has the Change been comm’ed out properly? Do we have nice templates so Change notification have a consistent look and feel?
||If the business are pushing for a Change to be fast tracked with minimum testing can you ask them to formally acknowledge the risk by relaxing any SLA?
The above list isn’t exhaustive but it’s a sensible starting point. There’s lots of guidance out there; ITIL has the 7 R’s of Change Management and COBIT has advice on governance. What do you look for when assessing Changes? Let me know in the comments!