ITIL Roles – Which Roles Can Be Filled By One Person?

NevenZitek
Neven Zitek, SPAN

Just by looking at the sheer number of ITIL functions and roles may leave you wondering – how do you fit a limited number of IT staff into so many roles? It’s obvious that one person will act in several roles, but how do you optimally combine them? Of course, it all depends on the size of your organization, and which ITIL processes that you’ve implemented, but none of that changes the fact that some roles fit well together, and some of them don’t.

 

 ITIL roles that fit together within the Service Lifecycle

ITIL-CombinedRoles (1)

 

Figure 1: ITIL roles that can be managed by a single person, and the relationship between role and ITIL Service Lifecycle.

The Business Relationship Manager role is responsible for managing and maintaining good relationships with customers, and most importantly, ensuring that the Service Catalogue is adequately meeting customer needs. Because part of customer relationship is agreeing upon and respecting agreed Service Levels, the Service Level Manager and Business Relationship Manager roles fit well together. The Service Level Manager’s focus is more oriented toward initial negotiations of service levels, but that makes him a good candidate for Business Relationship Manager, as he will be very familiar with the customer’s needs.

Risk Manager and Service Continuity Manager are both oriented toward the future, looking for the best possible outcome in case of undesired events. They fit well together, as both roles are responsible for risk management, threat identification and mitigation, and ensuring minimum / or acceptable impact on service delivery in case those events actually occur. The difference is that the Service Continuity Manager is focused on Force majeure and disaster scenario events, and the Risk Manager is focused on risk assessment of individual assets and their vulnerabilities. However, even with those differences, these two roles can easily be filled by a single person.

The Capacity Managers responsibility is ensuring that all infrastructure and services (if provided externally) are able to deliver performance and capacity within agreed levels, in a cost-effective manner. These responsibilities match nicely with the Availability Manager role, adding responsibility for meeting the agreed service availability. Both roles include planning, measuring, analyzing and improving of available resources against agreed and expected service levels; however, Capacity Management is concerned with personnel resources as well (e.g., overnight backup not completed, as there was no technician to change tapes), and Availability Management is not. As both roles include monitoring / measuring performance of individual service components, this might be a perfect match to include the Problem Management role as well, as the Problem Managers main task is to prevent incidents from happening, and minimize the impact of incidents that do happen. Having insight into individual service components’ status should be a good argument for fitting the Capacity and Availability Manager roles within the Problem Manager role.

The responsibility of maintaining information about assets, Configuration Items (CI) and their relationship is upon the Service Asset and Configuration Manager. This very important, yet laborious role is very similar to the Knowledge Manager‘s role, whose responsibility is to maintain information about knowledge available. That similarity in processes justifies the decision to share those two roles within a single person.

And, as I mentioned before in an earlier post Incident Management: How to separate roles at different support levels, another good role-sharing fit is Incident Manager and Service Desk Manager. Even though the Service Desk Manager has a slightly larger scope of responsibilities, what those two roles have in common is the aim to resolve incidents as soon as possible. In general, Service Desk is the place where all incidents will be reported; therefore, it makes perfect sense to try and resolve them on the spot.

Combining roles is a challenge for both smaller and larger companies. Obviously, smaller companies are de facto forced to fit as may roles as humanly possible into a single person, as there is no alternative. Larger companies may have the luxury of splitting roles among as many persons as they find fit; however, with so many ITIL roles available, it may not be wise to dedicate a single person to ever single role just because you can. If you are so fortunate as to have all necessary personnel available to take all the roles, think about the workload across the lifecycle. For example, if you don’t plan on releasing new services on a daily basis, do you need one Test Manager and one Release Manager? (Note that you shouldn’t combine those two roles, so please continue reading to find out why.)

In my opinion and experience, combining ITIL roles is always an option, as long as you take workload and common sense into consideration.

ITIL roles that shouldn’t be mixed together within the Service Lifecycle

While these are good examples of a single person acting in a multi-role environment, there are some obvious and less obvious role combinations that should be avoided.

The obvious role combination that should be avoided is service Test Manager and Release Manager. While the Release Manager is responsible to plan, control and release a service into the live / operational environment, the Test Manager is responsible to perform all necessary testing to ensure that the service deployed meets requirements. It’s an obvious conflict of interest, as the Release Manager will strive to get the service operational as soon as possible, while the Test Manager will always want to take as much time as possible in order to test the service properly.

A less-obvious role combination that ITIL experts commonly agree should be avoided is Incident Manager and Problem Manager. The Incident Manger is responsible to handle an incident in a way that will result in fast incident resolution or workaround. The Problem Manager, on the other hand, is not interested in quick fixes, but rather on the root cause of the incident – which may take much more time than any Incident Manager is ready to accept.

Another less-obvious combination of ITIL roles that should be avoided is making a Service Owner (any) Process Owner as well. The Service Owner is responsible for delivering the service in question (e.g., e-mail service) within agreed service levels. A Process Owner (e.g., Change Management, Incident management, Service Portfolio Management, etc.) is responsible for ensuring that the process in question is fit for its purpose and is run in an optimal way. As Process Owner, this person is in charge of all other services he does not own for that particular process, and may start looking at other services through “Service Owner glasses,” which should be avoided if possible.

Combining ITIL roles – if at first you don’t succeed, try again

Just remember that ITIL is best practice framework with logical and easy to follow structure. Combining multiple roles for one person should be done using common sense – you wouldn’t appoint the same person to report to himself, or approve his own recommendations, budget, and technical solution, the same way you wouldn’t appoint a wolf to guard the sheep. Combining ITIL roles is a challenge, and it takes time and experience to understand and foresee potential pitfalls certain role combinations may bring upon you. On the other hand, you can use that time to notice and change eventual “bad fits” that may already exist.  Just don’t be afraid to make a change; if anything, ITIL is all about the change.

 

This article was contributed by Neven Ziteck of SPAN

Problem management challenges and critical success factors

Following his presentation on “problem management challenges and critical success factors” at the 8th annual itSMF Estonia conference in December, Tõnu Vahtra, Head of Service Operations at Playtech (the world’s largest publicly-traded online gambling software supplier) gives us his advice on understanding problem management, steps to follow when implementing the process, and how to make it successful. 

Tõnu Vahtra
Tõnu Vahtra

Problem management is not a standalone process

Incident management and event management

It cannot exist without the incident management process and there is a strong correlation between incident management maturity and problem management efficiency/results. Incident management needs to ensure that problems are detected and properly documented (e.g. the basic incident management requirement that all requests need to be registered). Incident management works back-to-back with the event management process, if both of these processes are KPI managed then any anomalies in alarm or incident trends can be valuable input to problem management. Incident management also has to ensure that in parallel to restoring service during an incident it has to be ensured that relevant information is collected during or right after resolution (e.g. server memory dump before restart) so that there would be more information available to identify incident root cause(s).

Critical incident management

Problem management at Playtech gains a lot from the critical incident management function, which is carried out by dedicated Critical Incident Managers who have the widest logical understanding of all products and services and years of experience with solving critical incidents. They perform incident post mortem analysis following all major incidents, and they also start with initial root cause analysis (RCA) before handing this task over to problem management. RCA is handed over to Problem Managers within 24 hours from incident end time during which the Critical Incident Manager is collecting and organizing all information available about the incident. Critical Incident Managers usually do not have any problems with allocating support/troubleshooting resources from all support levels as critical incident troubleshooting and initial preventive measures are considered the highest priority within the mandate from highest corporate management. All the above ensures high quality input for problem management on a timely manner.

Change management and knowledge management

In Error Control phase the two most important processes for problem management are change management and knowledge management. Most action items identified during RCA are implemented through change management, the stronger the process the less problem management has to be involved directly in change planning (providing abstract goals VS concrete action plan or task list for implementation) and the smaller the risks of additional incidents during change implementation. Change management also needs to have the capability and documented process flow to implement emergency changes in an organized way with minimum impact to stop reoccurring critical incidents as fast as possible.

Knowledge management is vital for incident management for ensuring that service desk specialists would be able to quickly find and action specific workarounds for known errors until their resolution is still in progress by problem management. Regular input and high attention is needed from problem management to ensure that every stakeholder for known error database (KEDB) would be able to easily locate information relevant to his/her role, all units would be aware of information relevant to them and that all the information in KEDB would be relevant and up to date. In Playtech problem management is also managing process errors identified from root cause analysis and process improvements only last when properly documented, communicated to all relevant stakeholders and additional controls are put in place to detect deflections from optimal process. Local and cross-disciplinary knowledge management for process knowledge has an important role here.

Defect management

Problem management has to go beyond ITSM processes in a software development/services corporation like Playtech and also integrate to software development lifecycle (SDLC). For this purpose in Playtech a separate defect management sub-process has been established under problem management. Defect management is managing the lifecycle of all significant software defects identified from production environments and aligning defect fixing expectations between business and development departments. Defect Managers ensure a consistent prioritized overview of all significant outstanding software defects, which warrants optimal usage of development resources and minimizes overall business impact from defects. They act as a single point of contact for all defect related communication and ensure high transparency of defect fixing process and fix ETA’s. Defect Managers define the defect prioritization framework between business and development key stakeholders and govern the agreed targets.

Software problem management

Problem management is leading the software problem management process through defect management. Under the software problem management process (which is usually being ran by a quality assurance team in relevant development units) development teams are performing root cause analysis for defects highlighted for RCA by problem management or raised internally. Every defect is analyzed from two aspects: firstly why the defect was created by development and secondly if the defect was created then why was it not identified during internal QA and reported from production environment first. Root causes and action items are defined from both questions and tracked with relevant stakeholders. This process ensures that similar defects will not be created or will be identified internally in the future. Even more importantly there is a direct feedback channel from the field to the respective developer or team who created the defect so that they get full understanding of the business implications in relation to their activities.

Important steps to take problem management to the next level

The problem management unit has to become more proactive, to get more involved in service design and service transition phases to identify and eliminate problems before they reach production environments. Problem management needs resources to accommodate contributing to pre-production risk management and even more importantly this involvement has to be valued and enforced by corporate senior management as it may take additional resources and delay time-to-market in some situations.

The Problem Management Team itself can get more resources for proactive tasks by reducing their direct participation in reactive Problem management activities. This has to be done via advocating the Problem management mindset across the entire corporation (encouraging people to think in terms of cause and effect with the desire to understand issue causes and push their resolution for continuous improvement) so each major domain would have their Problem Coordinators and identify root causes/track action items independently and problem management could take more a defining and governing role. To assert the value created from problem management and enlist more people to spread the word about problem management ideas for them to go viral, it is essential to visualize the process and explain the relations between incidents, root causes and action items to all stakeholders for them to understand how their task is contributing to the bigger picture.

There is a high number of operationally independent problem management stakeholders in Playtech and implementing KPI framework that would be fit to measure and achieve problem management goals and be applicable to all major stakeholders individually and cross stakeholders seems almost impossible a task. The saying ”You get what you measure“ is very true in problem management and no stakeholder wants to be measured by problems that involves other stakeholders and are taking actions to remove such problems from their statistics instead of focusing on the problem and its solution. At the same time problem management tends to be most inefficient and difficult for problems spreading across multiple division. A Problem Manager’s role and assertiveness in facilitating a constructive and systematic process towards the resolution of such problems is crucial. And still problem management needs to find a creative approach to reflect such problems in KPI reports to present then as part of the big picture and sell them to executive management to get their sponsorship for major improvement tasks that compete with business development projects for the same resources while the latter has a much clearer ROI.

No problem exists in isolation and the problem records in KEDB can be related to specific categories/ domains and also related hierarchically to each other (there can be major principal problems that consist of smaller problems), also specific action items can contribute to the resolution of more than one problem. Problem categories cannot be restricted to fixed list as it can have multiple triggers and causes, it should be possible to relate a problem record to all interested stakeholders, for this dynamic tagging seems to be a better approach than limited number of categories (for example list of problems that are related to a big project). Instead of looking into each problem in isolation each problem should be approached and prioritized in the right context fully considering its implications and surroundings. No ITSM tool today provides the full capabilities for problem tagging or creating the mentioned relations without development, not to mention the visualization of such relations that would be a powerful tool in trend or WHAT-IF analysis and problem prioritization. Playtech is still looking for the most optimal problem categorization model and the tool that would enable the usage of such model.

Advice to organizations that are planning to start the implementation of the problem management process

For organizations starting the implementation of problem management process  my advice is don’t take all the process activities from the ITIL book and start blindly implementing them, this is not the way to start the implementation of this process or any other. Problem management success depends mostly on a specific mindset and in an already established organization it may take years for the right mindset to be universally accepted. Problem management formal process should be initially mostly invisible to all the stakeholders outside of the Problem Management Team to avoid the natural psychological tendency to resist change.

It is essential to allocate dedicated resources to problem management (Playtech assigned dedicated person to problem management in 2007, and any problem management activities prior to that were ad-hoc and non-consistent). The problem management unit should start from performing root cause analysis and removing the root causes of present major incidents that have the highest financial and reputational impact on the organization. If such incidents are being closely monitored by senior management and key stakeholders, solving them can earn the essential credits for problem management to get attention and resources for solving problems elsewhere. Secondly problem management should look at the most obvious reoccurring alarm and incident trends that result in a high support/maintenance cost. By resolving such problems they gain the trust of support and operational teams whose workload is reduced and they are more willing to contribute and cooperate in future root cause analysis. Problem final review before closure is a task often neglected but to improve the process it is essential to assess if the given problem was handled efficiently and to give feedback about problem solution to all relevant parties. Proactive problem management or KPI’s are not essential to start with and Problem Managers should concentrate on activities with highest exposure and clear value.

In summary

There will definitely be setbacks in problem management and in order to make a real difference with this process and increase the process maturity over time it has to have at least three things. A strong and assertive leader who is persistent in advocating the problem management; a continuous improvement mindset throughout the organization; and the ability to find a way forward from dead-end situations with out of the box thinking. When there is no such leader then involving external problem management experts may also help as a temporary measure to get the focus back on the most important activities. However, this measure is not sufficient in the long-term as the problem management process constantly needs to evolve with its organization and adjust with significant operational changes to be fit for purpose and remain relevant.

You can download Tõnu’s presentation in full here.

Service Improvement at Cherry Valley

Problem, risk, change , CSI, service portfolio, projects: they all make changes to services.  How they inter-relate is not well defined or understood.  We will try to make the model clearer and simpler.

Problem and Risk and Improvement

The crew was not warned of the severe weather ahead

In this series of articles, we have been talking about an ethanol train derailment in the USA as a case study for our discussions of service management.  The US National Transport Safety Board wrote a huge report about the disaster, trying to identify every single factor that contributed and to recommend improvements.  The NTSB were not doing Problem Management at Cherry Valley.  The crews cleaning up the mess and rebuilding the track were doing problem management.  The local authorities repairing the water reservoir that burst were doing problem management.  The NTSB was doing risk management and driving service improvement.

Arguably, fixing procedures which were broken was also problem management.   The local dispatcher failed to tell the train crew of a severe weather warning as he was supposed to do, which would have required the crew to slow down and watch out.  So training and prompts could be considered problem management.

But somewhere there is a line where problem management ends and improvement begins, in particular what ITIL calls continual service improvement or CSI.

In the Cherry Valley incident, the police and railroad could have communicated better with each other.  Was the procedure broken?  No, it was just not as effective as it could be.  The type of tank cars approved for ethanol transportation were not required to have double bulkheads on the ends to reduce the chance of them getting punctured.  Fixing that is not problem management, it is improving the safety of the tank cars.  I don’t think improving that communications procedure or the tank car design is problem management, otherwise if you follow that thinking to its logical conclusion then every improvement is problem management.

A distinction between risks and problems

But wait: unreliable communications procedure and the single-skinned tank cars are also risks.  A number of thinkers, including Jan van Bon, argue that risk and problem management are the same thing.  I think there is a useful distinction: a problem is something that is known to be broken, that will definitely cause service interruptions if not fixed; a “clear and present danger”.  Risk management is something much broader, of which problems are a subset.  The existence of a distinct problem management practice gives that practice the focus it needs to address the immediate and certain risks.

(Risk is an essential practice that ITIL – strangely – does not even recognise as a distinct practice; the 2011 edition of ITIL’s Continual Service Improvement book attempts to plug this hole.  COBIT does include risk management, big time.  USMBOK does too, though in its own distinctive  way it lumps risk management under Customer services; I disagree: there are risks to our business too that don’t affect the customer.)

So risk management and problem management aren’t the same thing.  Risk management and improvement aren’t the same thing either.  CSI is about improving the value (quality) as well as reducing the risks.

To summarise all that: problem management is part of risk management which is part of service improvement.

Service Portfolio and Change

Now for another piece of the puzzle.  Service Portfolio practice is about deciding on new services, improvements to services, and retirement of services.  Portfolio decisions are – or should be – driven by business strategy: where we want to get to, how we want to approach getting there, what bounds we put on doing that.

Portfolio decisions should be made by balancing value and risk.  Value is benefits  minus  costs.  There is a negative benefit and a set of risks associated with the impact on existing services of building a new service:  there is the impact of the project dragging people and resources away from production, and the ongoing impact of increased complexity, the draining of shared resources etc….  So portfolio decisions need to be made holistically, in the context of both the planned and live services.  And in the context of retired services too: “tell me again why we are planning to build a new service that looks remarkably like the one we killed off last year?”.  A lot of improvement is about capturing the  learnings of the past.

Portfolio management is a powerful technique that is applied at mulltiple levels.  Project and Programme Portfolio Management is all the rage right now, but it only tells part of the story.  Managing projects in programmes and programmes in portfolios only manages the changes that we have committed to make; it doesn’t look at those changes in the context of existing live services as well.  When we allocate resources across projects in PPM we are not looking at the impact on business-as-usual (BAU); we are not doling out resources across projects and BAU froma  single pool.  That is what a service portfolio gives us:  the truly holistic picture of all the effort  in our organisation across change and BAU.

A balancing act

Service portfolio management is a superset of organisational change management.  Portfolio decisions are – or should be – decisions about what changes go ahead for new services and what changes are allowed to update existing services, often balancing them off against each other and against the demands of keeping the production services running.  “Sure the new service is strategic, but the risk of not patching this production server is more urgent and we can’t do both at once because they conflict, so this new service must wait until the next change window”.  “Yes, the upgrade to Windows 13 is overdue, but we don’t have enough people or money to do it right now because the new payments system must go live”.  “No, we simply cannot take on another programme of work right now: BAU will crumble if we try to build this new service before we finish some of these other major works”.

Or in railroad terms: “The upgrade to the aging track through Cherry Valley must wait another year because all available funds are ear-marked for a new container terminal on the West Coast to increase the China trade”.  “The NTSB will lynch us if we don’t do something about Cherry Valley quickly.  Halve the order for the new double-stack container cars”.

Change is service improvement

Everything we change is service improvement. Why else would we do it?  If we define improvement as increasing value or reducing risk, then everything we change should be to improve the services to our customers, either directly or indirectly.

Therefore our improvement programme should manage and prioritise all change.  Change management and service improvement planning are one and the same.

So organisational change management is CSI. They are looking at the beast from different angles, but it is the same animal.  In generally accepted thinking, organisational change practice tends to be concerned with the big chunky changes and CSI tends to be focused more on the incremental changes.  But try to find the demarcation between the two.   You can’t decide on major change without understanding the total workload of changes large and small.  You can’t plan a programme of improvement work for only minor improvements without considering what major projects are planned or happening.

In summary, change/CSI  is  one part of service portfolio management which also considers delivery of BAU live services.  A railroad will stop doing minor sleeper (tie) replacements and other track maintenance when they know they are going to completely re-lay or re-locate the track in the near future.  After decades of retreat, railroads in the USA are investing in infrastructure to meet a coming boom (China trade, ethanol madness, looming shortage of truckers); but they better beware not to draw too much money away from delivering on existing commitments, and not to disrupt traffic too much with major works.

Simplifying service change

ITIL as it is today seems to have a messy complicated story about change.  We have a whole bunch of different practices all changing our services, from  Service Portfolio to Change Management to Problem Management to CSI.  How they relate to each other is not entirely clear, and how they interact with risk management or project management is undefined.

There are common misconceptions about these practices.  CSI is often thought of as “twiddling the knobs”, fine-tuning services after they go live.  Portfolio management is often thought of as being limited to deciding what new services we need.  Risk management is seen as just auditing and keeping a list.  Change Management can mean anything from production change control to organisational transformation depending on who you talk to.

It is confusing to many.  If you agree with the arguments in this article then we can start to simplify and clarify the model:

Rob England: ITSM Model
I have added in Availability, Capacity, Continuity, Incident and Service Level Management practices as sources of requirements for improvement.  These are the feedback mechanisms from operations.  In addition the strategy, portfolio and request practices are sources of new improvements.   I’ve also placed the operational change and release practices in context as well.

These are merely  the thoughts of this author.  I can’t map them directly to any model I recall, but I am old and forgetful.  If readers can make the connection, please comment below.

Next time we will look at the author’s approach to CSI, known as Tipu.

Image credit: © tycoon101 – Fotolia.com

The RBS Glitch – A Wake Up Call?

More than a fortnight (from the last couple of weeks of June) after a “glitch” affected Royal Bank of Scotland (RBS), Natwest and Ulster Bank accounts, the fall-out continues with the manual processing backlog still affecting Ulster Bank customers.

Now, the Online Oxford Dictionary defines a glitch as:
a sudden, usually temporary malfunction or fault of equipment

I don’t think anyone affected would see it in quite the same way.

So when did this all happen?

The first I knew about was a plaintive text from a friend who wanted to check her balance, and could not because:
“My bank’s computers are down”
By the time the evening rolled around, the issue was becoming national news and very clear that this was more than just a simple outage.

On the night of Tuesday 19th June, batch processes to update accounts were not being processed and branches were seeing customer complaints about their balances.

As the week progressed, it became clear that this was no simple ‘glitch’, but the result of some failure somewhere, affecting 17 million customers.

What actually happened?

As most people can appreciate, transactions to and from people’s accounts are typically handled and updated using batch processing technology.

However, that software requires maintenance, and an upgrade to the software had to be backed out, but as part of the back out, it appears that the scheduling queue was deleted.

As a result, inbound payments were not being registered, balances were not being updated correctly, with the obvious knock on effect of funds showing as unavailable for bills to be paid, and so on.

The work to fix the issues meant that all the information that had been wiped had to be re-entered.

Apparently the order of re-establishing accounts were RBS first, then NatWest, and customers at Ulster Bank were still suffering the effects as we moved into early July.

All the while news stories were coming in thick and fast.

The BBC reported of someone who had to remain an extra night in jail as his parole bond could not be verified.

House sales were left in jeopardy as money was not showing as being transferred.

Even if you did not have your main banking with any of the three banks in the RBS group, you were likely to be effected.

If anyone in your payment chain banked with any of those banks, transactions were likely to be affected.

Interestingly enough, I called in to a local branch of the one of the affected banks in the week of the crisis as it was the only day I had to pay in money, and it was utter chaos.

And I called in again this week and casually asked my business account manager how things had been.

The branches had very little information coming to them at the height of the situation.

When your own business manager found their card declined while buying groceries that week, you have to wonder about the credibility of their processes.

Breaking this down, based on what we know

Understandably, RBS has been reticent to provide full details, and there has been plenty of discussion as to the reasons, which we will get to, but let’s start by breaking down the events based on what we know.

  • Batch Processing Software

What we are told is that RBS using CA Technologies CA-7 Batch processing software.

A back-out error was made after a failed update to the software, when the batch schedule was completely deleted.

  •  Incidents Reported

Customers were reporting issues with balance updates to accounts early on in the week commencing 20th June, and soon it became clear that thousands of accounts were affected across the three banks.

Frustratingly some, but not all services were affected – ATMs were still working for small withdrawals, but some online functions were unavailable.

  •  Major Incident

As the days dragged on, and the backlog of transactions grew, the reputation of RBS and Natwest particular came under fire.

By the 21st June, there was still no official fix date, and branches of NatWest were being kept open for customers to be able to get cash.

  •  Change Management

Now we get to the rub.

Initial media leaks pointed to a junior administrator making an error in backing out the software update and wiping the entire schedule, causing the automated batch process to fail.

But what raised eyebrows in the IT industry initially, was the thorny subject of outsourcing.

RBS, (let me stress like MANY companies), has outsourced elements of IT support off-shore.

Some of that has included administration support for their batch processing, but with a group also still in the UK.

Many of these complex systems have unique little quirks.  Teams develop “in-house” knowledge, and knowledge is power.

Initial reports seemed to indicate that the fault lay with the support and administration for the batch processing software, some of which was off-shore.

Lack of familiarity with the system also pointed to perhaps issues in the off-shoring process.

However, in a letter to the Treasury Select Committee, RBS CEO Stephen Hester informed the committee that the maintenance error had occurred within the UK based team.

  •  Documentation

The other factor is human need to have an edge on the competition – after all, knowledge if power.

Where functions are outsourcers, there are two vital elements that must be focussed on (and all to often are either marginalised/ignored due to costs):

1)      Knowledge Transfer

I have worked on many clients where staff who will be supporting the services to be outsourced are brought over to learn (often from the people whose jobs they will be replacing).

Do not underestimate what a very daunting and disturbing experience this will be, for both parties concerned.

2)      Documentation

Even if jobs are not being outsourced, documentation is often the scourge of the technical support team.  It is almost a rite of passage to learn the nuances of complex systems.

Could better processes help?

It is such a negative situation, I think it is worth looking at the effort that went into resolving it.

The issues were made worse by the fact that the team working to resolve the problem could not access the record of transactions that were processed before the batch process failed.

But – the processes do exist for them to manually intervene and recreate the transactions, albeit die to lengthy manual intervention.

Teams worked round the clock to clear the backlog, as batches would need to be reconstructed once they worked out where they failed.

In Ulster Bank’s case, they were dependant on some NatWest systems, so again something somewhere must dictate the order in which to recover, else people would be trying to update accounts all over the place.

Could adherence to processes have prevented the issue in the first place?

Well undoubtedly.  After all, this is not the first time the support teams will have updated their batch software, nor will it have been the first time they have backed out a change.

Will they be reviewing their procedures?

I would like to hope that the support teams on and off shore are collaborating to make sure that processes are understood and that documentation is bang up-to-date.

What can we learn from this?

Apart from maybe putting our money under the mattress, I think this has been a wake up call for many people who, over the years, have put all their faith in the systems that allow us to live our lives.

Not only that, though, but in an environment where quite possibly people have been the target of outsourcing in their own jobs, it was a rude awakening to some of the risks of shifting support for complex integrated systems without effective training, documentation, and more importantly back up support.

Prior to Mr Hester’s written response to the Treasury Select Committee, I had no problem believing that elements such as poor documentation/handover, and a remote unfamiliarity with a system could have resulted in a mistaken wipe of a schedule.

What this proves is that anyone, in ANY part of the world can make a mistake.