A recent Forrester consulting study (commissioned by automation vendor Chef and downloadable from their website at the link above) found that 40% of Fortune 1000 IT leaders report first time change success rates below 80% (or they simply didn’t know what the first time change success rate was at all), with another 37% stating their first time change success rate was between 80% and 95%.
In the same study, 69% of these same Fortune 1000 IT leaders report it takes them more than a week to make infrastructure changes, and an equal 69% report that it takes them more than a week to release application code into production (mind you that’s not to develop, test, and release the code, but just release code that’s already been written and tested!). Finally, 46% report that more than 10% of their incidents were self-inflicted from IT changes and, shockingly, 31% say they don’t even know what percentage of incidents are caused by changes!
Why Otherwise Capable IT Leaders Struggle with Change
What is going on in these IT shops to produce such bad numbers? Based on my experience with a number of Fortune 1000 IT organizations, I’d like to think that these study participants are just as smart and capable as the IT leaders and professionals I regularly meet with. They are well educated, very experienced (as are their teams), and nearly all of them have some form of change process, changes management software and a change advisory board to assess risks before changes are made. So, why isn’t this enough to produce better results?
I submit that there are two problems, which are actually related to each other.
Our environments have become extremely complex. The dependencies and relationships across multi-tiered applications / business services are way more than what one individual can know fully – no matter how talented and how long they’ve been working there.
Trends like virtualization, agile development, cloud, mobile, big data, etc. are also making this even harder as IT moves faster and faster to respond to business needs and as innovative new technologies proliferate.
We aren’t effectively capturing the input from multiple perspectives during the change planning process so we aren’t effectively identifying and mitigating risks.
Think about how a typical change and release planning process goes. It starts with a request for a change and a change planner filling out an electronic form about it. They assign various people to review and approve the change and this step might include consulting with a spreadsheet, perhaps looking at Configuration Item (CI) information in a CMDB, and maybe calling a meeting or sending out an email or two. In a lot of cases, those selected to participate in the review will include managers or more senior roles who don’t have a very good working knowledge of the operational environment, so they consult with their teams (or at least we hope they will) and eventually the change gets brought forward to the Change Advisory Board (CAB) for a formal approval. It may have taken a week, two weeks, a month or more just to get to this point.
Then the CAB, which is often made up of even more senior people, reviews the planned change. Often one of the CAB members will recognize that a key team or expert wasn’t included in the review process and “kicks it back” for further input and the change approval is rescheduled to a future CAB meeting. Equally often there’s a lot of pressure from the business to make the change happen right away (it could be a new application release the business has been waiting for), it could be a security fix and “we just can’t allow ourselves to be exposed by delaying it”, or maybe it’s just a firmware upgrade to a router and the vendor has said “it’s no big deal”. So the CAB says “go” and hopes everything works out okay, but a lot of times it simply doesn’t.
People Are Both The Problem And The Answer
By now you may have guessed that the way we engage people in the change process is not only the problem, but it’s also the solution. There’s a great quote from the MIT artificial intelligence expert, Marvin Minsky, that I think is very relevant here: “You don’t really understand something until you understand it more than one way.”
This is, in effect, what we try to do by assigning multiple reviewers and approvers to a change request, but the problem is that we often guess about whom the best people are to involve so we end up oversubscribing the list and inundating people with emails and meetings or we undersubscribe and leave out key individuals.
The information these people have to work from is also very fragmented. Yes, we have our CMDBs and CI information, but they’re often incomplete and not always trusted, so people fall back on their tribal knowledge, which may also be incomplete and out of date. A lot of the time, we might intentionally leave out groups because we think that will slow things down, “Do we really need to involve the network team on a SAN upgrade? Why do we need security involved in a database patch?” The network might have a direct impact on the success of the SAN upgrade, because we might need to optimize network device settings to handle additional load to the SAN. That database we’re patching might contain sensitive customer data and the right patch procedure better be followed or we’ll create a compliance problem. So if we leave out people that may be necessary, we create unexpected ripple effects from our changes too.
Engaging Relevant Experts to Collaborate Is The Key
I suggest that there are two things we need to do in order to better engage the right people so we can improve first change success rates, speed the time to execute changes, and reduce incidents from changes:
- We need to know up front who the right people are to involve (and who not to involve as well), so we can be sure we include all the right perspectives (and don’t unnecessarily pull people off of what they are already working on as well)
- We need to arm those we involve with accurate information about upstream and downstream dependencies so they can make informed and quicker recommendations
As an industry, this is what we should be focused on rather than whether a strict approval process alone was followed. By enabling our experts to opt-in to the things they are responsible for and care about, they can be automatically identified and engaged when it comes time to plan a change . We also need to take a lesson from academic journals and apply a peer review process to our CMDB data so we can increase trust in its use and fill in the gaps with the tribal knowledge of our experts, validating that both sets of information are accurate and up to date. With this type of an approach, we can have a much stronger basis for smarter change decision-making. This is exactly the type of approach we’re taking in my organization, and I invite you to check out what the ITSM Review team has to say about it.