Nashville Bombing Part 2
As I said last week, while the bombing is a horrible event, it does point out how brittle our telecommunications world is. That being said, for most companies, the rest of the IT infrastructure is probably more brittle.
Companies should use this as an opportunity to review their situation and see if they can make improvements at a price that is affordable.
While AT&T was able to strike a deal with the City of Nashville to commandeer Nissan Stadium, home of the Titans, to set up a replacement central office, you probably will not get the same treatment if you asked.
AT&T was also able to deploy 25 tractor trailers of computer equipment to replace the equipment that was damaged.
Finally, AT&T was able to temporarily reassign personnel with every skill that they might possibly need from fiber techs to computer programmers. Again, you likely would not be able to do that.
The question for you to “game out” is what are my critical vendors and what would I do if they had a meltdown. I don’t mean a 30 minute outage, I mean a meltdown. We have seen, for example, tech companies that have gotten hit by ransomware.
Perhaps, like many companies, you use a managed service provider or MSP. A number of MSPs have been hit by ransomware and when they do, often so do their customers. Does your MSP have the resources to defend all (or most of) its customers from a ransomware attack at once. How long would it take your MSP to get you back to working? Even large MSPs (which equals many customers) likely don’t have the resources.
If that were to happen to you – and of course, they have the only copies of your data, right? – what would they do and what would you do?
Maybe your servers are hosted in your office. There are a lot of possible events that could occur.
Even if your servers are in a colo, things can occur that can take you down.
Here is one thing to start with –
For each key system from personnel to public web sites, both internal and at third parties, document your RECOVERY TIME OBJECTIVE or RTO. The RTO is the maximum acceptable downtime before recovering. For example, for payroll, it might be 24 hours. But what if the outage happens at noon on the day that payroll must be sent to your bank? So, think carefully about what the maximum RTO is and remember that it will likely be different for different systems.
Then, for system, document the RECOVERY POINT OBJECTIVE or RPO. The RPO is the point in time, counting backward from the event, that you are willing to lose data. For example, if this is an ecommerce system, maybe you are willing to lose 30 minutes worth of orders. Or maybe 5 minutes. If it is an accounting system, maybe it is 8 hours (rekeying one day’s worth of AR and AP may be considered acceptable). Again each system will likely be different.
Then get all of the lines of business, management and the Board (if there is one) to agree on those times. Note that shorter RTOs and RPOs mean increased cost. The business units may say that they are not willing to lose any data. If you tell them that you can do that, but it will cost them a million dollars a year, they may rethink that. Or management may rethink that for them. The key point is to get everyone on the same page.
Once you have done that, make a list of the possible events that you need to deal with.
- Someone plants a bomb in an RV outside your building and there is severe physical damage to your building.
- Or maybe the bomb is down the block, but the force of the blast damages the water pipes in your building .
- Or, the bomb is down the block and there is no damage to your building, but the city has turned off water, power and gas to the building. And the building is inside a police line and will be inaccessible while the police try to figure out what is going on.
- In the case of AT&T, they had to pump three FEET of water out of the building. Water and generators are not a good mix. Neither are water and batteries. While AT&T lost their generators as a result of the blast, their batteries were distributed around the building so they did not lose ALL of their batteries.
Note that you do not need to think up all the scenarios yourself. You can look at the news reports and after-action reports from other big, public meltdowns. Here is another article on the Nashville situation.
Now create a matrix of events and systems for your RTO and RPO numbers. In the intersection box, you can say that you already can meet those objectives or that it will cost $1.29 one time to meet it or a million dollars a year. You need to include third party providers if they run and manage any systems that are critical to you.
Once you have done all that, you can go back to management and the lines of business and tell them here is the reality – what risk are you willing to accept? This is NOT an IT problem. This is a business problem.
The business will consider the likelihood of the event – even after Nashville, an RV filled with explosives is an unlikely event and the cost to mitigate the problem is likely high. For some systems the cost may be low enough and the risk high enough that management says fix it. For other systems, probably not.
The key point is that everyone from the lines of business to management to the Board all understand what the risks are and what the mitigation costs are. From this data, they can make an informed BUSINESS decision on what to do.
If you need help with this, please contact us.