Uptime Guarantees Are Misleading
You’ve more than likely heard the term 99% Uptime Guaranteed when searching for a provider to host your website or online services, which can sound really appealing. A guaranteed uptime can really take the weight off your shoulders, especially if you’ve been dealing with managing servers on your own. But what exactly does this term mean, and why aren’t more providers offering 100% uptime? The answer is not as simple as you may think.
Companies often enforce these policies with what is called a Service Level Agreement (SLA), a legal document that, like the Terms of Service, most people don’t read before agreeing to. Hidden within this document lies the verbiage that lays out what the 99% Guarantee means, how the company will obtain this statistic, and what happens if they fail to do so. More importantly, it also contains the responsibilities of the customer should the company fail to meet the uptime guarantee.
This isn’t always straightforward, unfortunately. Having an SLA in place is common practice in the industry and actually benefits both parties if executed properly. However, like any legal document, it is easy to hide nefarious terms and clauses.
100% Uptime Might Not Mean Always Online
Within the SLA, you’ll find where the uptime is defined. This is important because uptime is sometimes defined differently between providers. For instance, Cloudflare’s Business SLA defines 100% uptime as serving “Customer Content 100% of the time without qualification” and credits applied to the account with the following formula: “(Outage Period minutes * Affected Customer Ratio) ÷ Scheduled Availability minutes.” By contrast, Atlantic.net’s Network Uptime is defined as “the network” being “available 100% of the time in a given month” and credits applied to the account “if it takes [them] more than 30 minutes to resolve the network issue from the time the trouble ticket is opened.”
The common denominator between these Service Level Agreements tends to be that credits will be applied to your account if and when services become unavailable, as opposed to the services being always available. Credits typically get calculated differently, and more often than not you must request directly with the provider to have credits applied to your account.
Uptime When It’s Calculated In “9’s”
Arguably the most common way to measure uptime is in “9’s”, as in saying 99.9% or 99.99% and so on. While this may seem like a trivial difference, measuring in 9’s can actually have a big impact on the overall downtime allowed for a business’s online services. See the below table (credit to Jack Stromberg) for an example:
|Availability||Downtime Per Year||Downtime Per Month||Downtime Per Week||Downtime Per Day|
|99% (“Two Nines”)||3.65 Days||7.20 Hours||1.68 Hours||14.4 Minutes|
|99.5%||1.83 Days||3.60 Hours||50.4 Minutes||7.2 Minutes|
|99.8%||17.52 Hours||86.23 Minutes||20.16 Minutes||2.88 Minutes|
|99.9% (“Three Nines”)||8.76 Hours||43.8 Minutes||10.1 Minutes||1.44 Minutes|
|99.95%||4.38 Hours||21.56 Minutes||5.04 Minutes||43.2 Seconds|
|99.99% (“FourNines”)||52.56 Minutes||4.38 Minutes||1.01 Minutes||8.66 Seconds|
|99.995%||26.28 Minutes||2.16 Minutes||30.24 Seconds||4.32 Seconds|
|99.999% (“Five Nines”)||5.26 Minutes||25.9 Seconds||6.05 Seconds||864.3 Milliseconds|
|99.9999% (“Six Nines”)||31.5 Seconds||2.59 Seconds||604.8 Milliseconds||86.4 Milliseconds|
As you can see, the difference in downtime allowed per year between “two nines” and “six nines” is nearly 3 and 1/2 days! This is certainly not something that is usually disclosed upfront when comparing service providers and is typically buried within the SLA, or only found after your own independent research. There are ways to improve upon this reliability statistic, even beyond what is implemented at the service provider level, which we will explore below. The important thing to remember is that although the difference between 99% and 99.99% may not seem like a lot at first glance, the math shows a different story.
Achieving True High-Availability
There are a plethora of different methods to improve upon reliability and achieve high-availability. From fully hands-on approaches like LINBIT’s DRBD used to replicate the storage system of Linux system into an HA Cluster, to fully managed solutions like Shorey IT’s Uplink program which takes the guesswork out of web hosting and is covered under a 100% Uptime SLA. Cluster management systems, like Kubernetes, have become more popular in recent years to help with scaling and reliability while offering features like self-healing and load-balancing.
Shorey IT is more than happy to help you achieve high availability with whatever IT services you currently work with or help you migrate to different cloud providers that may offer a more appealing Service Level Agreement. Be on the lookout for hidden terms and clauses and remember, 100% Uptime does not mean that the service will be always available.