How predictive analytics have turned Incident Management on its head

22698329527_fb5a82e3e2_z

Predictive analytics is set to turn the world of IT service management, and in particular Incident Management, on its head. After all, it has already done this for IT Capacity Planning, where it is now possible to predict and avoid future incidents at a workload level.

Within IT capacity planning, forecasting (predicting, if you like) has always been a key feature of the discipline. It was used to ensure that large chunks of demand, either through growth or change, could be met while focusing on the strategic horizon rather than the day to day operation. If there are capacity issues, the Service Operation process of Incident Management informs the Service Design process of Capacity Management to allow it to be dealt with as part of future Service Design activity.

Incident Management should inform IT Capacity Planning about incidents logged due to capacity or performance issues, whereby this intelligence would then be used to assist in the diagnosis and resolution of incidents. The idea that Capacity Management informs Incident Management of future and avoidable incidents, or indeed how to deal with them, is a relatively new concept.

Playing the tactical game

5003567111_5f05155923_z

Technological advances have opened many new areas of innovation and opportunity in this space. Virtualization, automation, big data and predictive analytics have empowered IT capacity planning to extend into day to day management at a more granular and forensic level, rather than focusing solely on strategic activity. The following are the four major drivers which have spurned on this evolution:

  • Virtualization – or more importantly – the hypervisor

Whilst allowing multiple virtual workloads to operate on a single physical machine should make life more difficult, it actually simplifies things by reducing the number of information sources that need to be interrogated.

  • Data Automation

When dealing with different system management tools, vendors and formats consider the amount of data points generated. Let’s take a 10,000 server estate over a single 24 hour period, capturing data at 5 minute intervals – this would generate almost 3 million data points. For the information to be used for predictive analysis, we would recommend at least 30 days’ worth of monitoring data in order to gain worthwhile insight. Without automation it would take an army to schedule the retrieval, aggregation, cleansing, loading and transforming of the data from a number of bespoke sources in a meaningful timeframe.

  • Big Data

Big Data delivers the ability to store the massive amounts of data in a way that makes sense and allows for further manipulation. With associated hardware advances, the cost of storage, scalability and more powerful compute have made Big Data a reality.

  • Predictive Analytics

And finally, analytics provides the ability to churn data in a multitude of ways, using pattern matching and algorithms to analyse and provide insight into an organisation’s IT operation that would otherwise go unnoticed. Whether that be an over utilisation of, or an impending shortfall of resources. The analytics available today are essential if IT managers want to keep on top of the complexity and scale of their IT estate. In the IT environment of today, IT managers need to be confident in their knowledge of their IT infrastructure, and the various changing demands placed on it, in order to see what’s around the corner and avoid potential incidents.

Zooming in

For IT capacity planning, the unit of currency has reduced from physical machine to individual workload. Reducing the timeframe to provide short term tactical information while improving our ability to understand and model long term strategic actions. Changing the relationship between incident management and IT Capacity Planning allows you to identify shortfalls in advance, sidestep the avoidable and turn your Incident Management process on its head.

Screen-Shot-2016-03-11-at-11.42.51-300x289

 

This article was contributed by Stuart Higgins, Technical Evangelist at Sumerian.

Analytics Image Credit

Tactics Image Credit

 

The Holy Trinity of IT Service Management

2117651980_9ce329c4de_z

People, technology and process are the compounds that construct the IT Service Management triumvirate. Having already identified the technology trends, and in particular how predictive analytics will impact incident management, what can we say about the other two members of this very exclusive club?

While process tends to lead the way, it needs people to champion it, and technology to support it. Technology, in the grand scheme of things, tends to be the easiest part to implement as long as it exists and is fit for purpose.

Low level detection

The ability to detect and avoid incidents isn’t something that’s included in the ITIL manual. We could spin it into something to do with Continual Service Improvement, but activities in this area tend to be run on a project basis. They are in effect more likely to be elements of a change programme.

So what can be done when dealing with information relating to the future at such a granular level on a daily basis? The simplest thing would be to treat predictive events as actual incidents, pop them into a team’s queue and let them deal with them alongside everything else.

But what priority should they be given? The predicted incident can’t be high as nothing is broken, and nobody is screaming. On the other hand, if they are treated as a low priority, the issue may never be dealt with in a timeframe that permits the incident to be avoided. Medium, then? Perhaps not, as if the resolution requires additional spend then you need to conform to a purchasing timeframe and once again the benefit of being able to avoid a failure, may be lost.

The answer, unsurprisingly, is that it depends. It will depend on the organisation and how mature its processes are, how stable its services are, and its attitude to risk.

A stitch in time?

How many organisations will zealously fund proactive remedial work? Securing the budget to keep things in a current and supported state is difficult and at times impossible. I’m sure every organisation has a server somewhere that has effectively been shrink wrapped as it is no longer supportable and needs to be kept as protected from change as much as possible, as the service it supports provides good, perhaps even essential value to the business.

It is unlikely that an IT department will be given a blank cheque book to allow it to respond to predicted events. Does this mean that that things will knowingly be left to fail?

Therein lies another people aspect. How are IT Service Management staff rewarded?

Fire fighter or keeper of the peace?

If services operate without issue the IT department becomes the focus of cost cutting.

If on the other hand systems fail, all thanks are given to those that worked tirelessly through the night, surviving only on pizzas and vending machine coffee. Like or lump it, the reality is that in these types of scenarios, those that are seen to be doing are those that progress.

IT has a very real culture of martyrdom embedded within it that will be difficult to change.

Of course there will still be unexpected incidents that can’t be predicted but in a world where we can now identify and avoid incident there needs to be a balance that encourages and rewards the proactive as much as the reactive.

Different thinking is needed together with a different reward structure. Pavlov discovered long ago that you have to reward the behaviours you want.

Are your service team keeping the peace or fighting fires? I’d suggest you want people calmly going about their business to let business go about business. Avoiding the avoidable helps them to do just that.

Screen Shot 2016-03-11 at 11.42.51

 

This article was contributed by Stuart Higgins, Technical Evangelist at Sumerian

 

Image credit