More than a fortnight (from the last couple of weeks of June) after a “glitch” affected Royal Bank of Scotland (RBS), Natwest and Ulster Bank accounts, the fall-out continues with the manual processing backlog still affecting Ulster Bank customers.
Now, the Online Oxford Dictionary defines a glitch as:
a sudden, usually temporary malfunction or fault of equipment
I don’t think anyone affected would see it in quite the same way.
So when did this all happen?
The first I knew about was a plaintive text from a friend who wanted to check her balance, and could not because:
“My bank’s computers are down”
By the time the evening rolled around, the issue was becoming national news and very clear that this was more than just a simple outage.
On the night of Tuesday 19th June, batch processes to update accounts were not being processed and branches were seeing customer complaints about their balances.
As the week progressed, it became clear that this was no simple ‘glitch’, but the result of some failure somewhere, affecting 17 million customers.
What actually happened?
As most people can appreciate, transactions to and from people’s accounts are typically handled and updated using batch processing technology.
However, that software requires maintenance, and an upgrade to the software had to be backed out, but as part of the back out, it appears that the scheduling queue was deleted.
As a result, inbound payments were not being registered, balances were not being updated correctly, with the obvious knock on effect of funds showing as unavailable for bills to be paid, and so on.
The work to fix the issues meant that all the information that had been wiped had to be re-entered.
Apparently the order of re-establishing accounts were RBS first, then NatWest, and customers at Ulster Bank were still suffering the effects as we moved into early July.
All the while news stories were coming in thick and fast.
The BBC reported of someone who had to remain an extra night in jail as his parole bond could not be verified.
House sales were left in jeopardy as money was not showing as being transferred.
Even if you did not have your main banking with any of the three banks in the RBS group, you were likely to be effected.
If anyone in your payment chain banked with any of those banks, transactions were likely to be affected.
Interestingly enough, I called in to a local branch of the one of the affected banks in the week of the crisis as it was the only day I had to pay in money, and it was utter chaos.
And I called in again this week and casually asked my business account manager how things had been.
The branches had very little information coming to them at the height of the situation.
When your own business manager found their card declined while buying groceries that week, you have to wonder about the credibility of their processes.
Breaking this down, based on what we know
Understandably, RBS has been reticent to provide full details, and there has been plenty of discussion as to the reasons, which we will get to, but let’s start by breaking down the events based on what we know.
- Batch Processing Software
What we are told is that RBS using CA Technologies CA-7 Batch processing software.
A back-out error was made after a failed update to the software, when the batch schedule was completely deleted.
- Incidents Reported
Customers were reporting issues with balance updates to accounts early on in the week commencing 20th June, and soon it became clear that thousands of accounts were affected across the three banks.
Frustratingly some, but not all services were affected – ATMs were still working for small withdrawals, but some online functions were unavailable.
- Major Incident
As the days dragged on, and the backlog of transactions grew, the reputation of RBS and Natwest particular came under fire.
By the 21st June, there was still no official fix date, and branches of NatWest were being kept open for customers to be able to get cash.
- Change Management
Now we get to the rub.
Initial media leaks pointed to a junior administrator making an error in backing out the software update and wiping the entire schedule, causing the automated batch process to fail.
But what raised eyebrows in the IT industry initially, was the thorny subject of outsourcing.
RBS, (let me stress like MANY companies), has outsourced elements of IT support off-shore.
Some of that has included administration support for their batch processing, but with a group also still in the UK.
Many of these complex systems have unique little quirks. Teams develop “in-house” knowledge, and knowledge is power.
Initial reports seemed to indicate that the fault lay with the support and administration for the batch processing software, some of which was off-shore.
Lack of familiarity with the system also pointed to perhaps issues in the off-shoring process.
However, in a letter to the Treasury Select Committee, RBS CEO Stephen Hester informed the committee that the maintenance error had occurred within the UK based team.
The other factor is human need to have an edge on the competition – after all, knowledge if power.
Where functions are outsourcers, there are two vital elements that must be focussed on (and all to often are either marginalised/ignored due to costs):
1) Knowledge Transfer
I have worked on many clients where staff who will be supporting the services to be outsourced are brought over to learn (often from the people whose jobs they will be replacing).
Do not underestimate what a very daunting and disturbing experience this will be, for both parties concerned.
Even if jobs are not being outsourced, documentation is often the scourge of the technical support team. It is almost a rite of passage to learn the nuances of complex systems.
Could better processes help?
It is such a negative situation, I think it is worth looking at the effort that went into resolving it.
The issues were made worse by the fact that the team working to resolve the problem could not access the record of transactions that were processed before the batch process failed.
But – the processes do exist for them to manually intervene and recreate the transactions, albeit die to lengthy manual intervention.
Teams worked round the clock to clear the backlog, as batches would need to be reconstructed once they worked out where they failed.
In Ulster Bank’s case, they were dependant on some NatWest systems, so again something somewhere must dictate the order in which to recover, else people would be trying to update accounts all over the place.
Could adherence to processes have prevented the issue in the first place?
Well undoubtedly. After all, this is not the first time the support teams will have updated their batch software, nor will it have been the first time they have backed out a change.
Will they be reviewing their procedures?
I would like to hope that the support teams on and off shore are collaborating to make sure that processes are understood and that documentation is bang up-to-date.
What can we learn from this?
Apart from maybe putting our money under the mattress, I think this has been a wake up call for many people who, over the years, have put all their faith in the systems that allow us to live our lives.
Not only that, though, but in an environment where quite possibly people have been the target of outsourcing in their own jobs, it was a rude awakening to some of the risks of shifting support for complex integrated systems without effective training, documentation, and more importantly back up support.
Prior to Mr Hester’s written response to the Treasury Select Committee, I had no problem believing that elements such as poor documentation/handover, and a remote unfamiliarity with a system could have resulted in a mistaken wipe of a schedule.
What this proves is that anyone, in ANY part of the world can make a mistake.