Every business manager has a focus on Key Performance Indicators (KPIs) but what are KPIs measuring? The most important component for most involves uptime and the avoidance of outages, the ultimate business-damaging events in the modern economy. That focus on uptime is the primary measurement because until now, accurate measurement of availability and quality has been a fanciful hope.
But no matter what they’re measuring and how those measurements are calculated, and until now actual enterprise network monitoring measurements have been spotty, a truly effective integrated risk management stance remains a pipe dream for most companies.
The SLA Truth is…They Don’t Matter: When many C-level executives get asked about their risk management they mention the Service Level Agreements (SLAs) with their key infrastructure vendors. But SLAs are virtually meaningless as a pragmatic mechanism, and not just due to the way they are written to minimize the chances of the vendor needing to actually pay out in any substantial way. Ask yourself, how many times has a vendor gone bankrupt because of SLA payouts?
Put simply, if an outage interrupts the operations of a business, the damage will be deeper than any SLA can possibly help them recover from. If the outage lingers, it can be a true catastrophe for the future prospects of the business. It is not rare that the damage to that company’s customer relationships and brand proves in the long run to be a defining moment in the failure of a business. There are plenty of examples where the reputation of a business is forever tarnished by a ‘streak of technical bad luck’.
There’s no SLA for a Failed Business: SLA payments won’t bring back your customers who were rendered inoperable by a failure. They won’t turn around the prospects of a company whose customers lose trust in them. A loss of availability translates to a loss in revenue. A loss of critical business data translates to a lost business. A “100% SLA” is a sales tool, not a risk management tool.
This is not theoretical. It happens more often than you think, as it is just in the best interest of both provider and customer to keep those instances as quiet as possible. In one recent example, the data center of a European cloud provider experienced a fire. The provider’s customers who didn’t have a well-designed backup plan went down hard.
In another instance, a bank was plagued by DDOS attacks, leading to service interruptions that were infuriating to customers. In this instance the only link connecting their backend was a 10-megabit telecom link that had been purchased 15 years earlier. When 40,000 megabits of digital garbage was aimed at it there was little that could be done to rectify the situation.
The Recipe for Risk-Management Success: True risk management involves taking ownership of the company’s infrastructure and implementing tools for monitoring and observability. Savvy risk managers evaluate everything. They utilize multiple vendors who truly know what they’re doing and have backup plans for their backup plans. They stay up to date on new technologies from cutting-edge vendors, all the while monitoring what potential adversaries in the black hat world are doing.
When a company has true integrated risk management it allows SLAs to be what they already are for the vendor, a negotiating point. True resiliency allows a purchaser to surprise a vendor by saying “I really don’t need that 100% SuperTitanium SLA that allows me to call your CEO at 2AM, let’s talk price instead.”
Metrics are the Missing Link: The reason most companies focus mostly on uptime metrics is due to the lack of reliable, predictive, and actional data on latency to gain insight into the users’ digital experience. Combining reliability monitoring with true end-to-end latency data to closely replicate the real user experience would be invaluable to any product owner or technology executive.
Data Puts You in Control: Adherence to true risk management and the need for customers to take ownership of their infrastructure is why I co-founded LynkState. My partners and I thought of all the information that would have helped us do are jobs better in our network architecture and engineering careers and have found a way to deliver that to our customers.
The data is finally here that puts you in control of your integrated risk management. It is not about the SLAs. It is about YOU knowing what’s going on not just in your infrastructure and on your network, but on the global internet.