What businesses can learn from the Fastly CDN outage
On Tuesday evening, for a period of time, no one could access the New York Times site, nor the BBC, Trade Me, Radio New Zealand, Pinterest, nor Reddit and even the UK government website had gone down. For nearly an hour, the internet was a much quieter place.
The global internet outage was the first time many people ever heard of content delivery networks (CDN) and realised that seemingly independent and completely separate websites can all be taken down by the same glitch.
The culprit was Fastly, an edge cloud platform that includes a CDN service used by many global websites.
CDNs are networks of servers and data centres that enable faster transfer of assets to load internet content. In a weird way, Fastly’s cloud technology is designed to avoid precisely the kind of problem it ended up being responsible for earlier this week.
Essentially, Fastly has infrastructure closer to the location where the customers need it, in order for sites to have faster response times. The service is designed to help avoid common causes of online outages, such as the usual DDoS (denial-of-service attack) which can happen when a website experiences a sudden spike in traffic. Fastly avoids that by routing traffic through “nodes” to balance the load and prevent bottlenecks.
On Tuesday, one simple user error reportedly brought it all crashing down and websites that use Fastly stopped loading, with users reporting seeing “Error 503 Service Unavailable”.
On Wednesday, Fastly’s Senior Vice President of Engineering and Infrastructure Nick Rockwell explained in a blog post that the outage was triggered by a customer changing their settings.
“On May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances,” Rockwell wrote.
“Early June 8, a customer pushed a valid configuration change that included the specific circumstances that triggered the bug, which caused 85% of our network to return errors.”
“We detected the disruption within one minute, then identified and isolated the cause, and disabled the configuration. Within 49 minutes, 95% of our network was operating as normal,” he added.
Rockwell says the outage was “broad and severe” but the company has now rolled out a fix for the bug but apologised and admitted that the company “should have anticipated it”.
The outage made a lot of people realise just how many websites rely on the same services to stay online and how a relatively small number of companies hold the keys to the majority of the internet infrastructure.
It is not the first time an outage like this happens, causing serious disruption. Back in 2017, for example, Amazon Web Services suffered an outage that led to some global websites staying offline for several hours in the East Coast of the US.
It also won’t be the last time – and it’s a good reminder for businesses to prepare for disruption like this.
Businesses of all sizes need to prioritise future-proofing against potential outages. The Fastly outage, much like recent headlines about cyberattacks, are a good reminder of that.
Here are some things businesses should consider, in the aftermath of this incident:
Plan for disruption
Be it online outages or supply chain issues, this is your reminder to ensure you have a plan in place should your business encounter any discontinuity.
Consider decentralising some services
Avoid heavy reliance on a sole provider, when possible, to minimise the risk of disruption.
Invest in infrastructure that supports rapid change
The key to overcoming these issues is to have a plan in place that you can quickly put into action when things go wrong – as it seems they inevitably do, at one point or another.
Taking disruption for granted means you will be more prepared to deal with it when it happens. If it doesn’t, even better, but you won’t regret it.
Umbrellar Powered by Pax8
Get the Cloud, Done Right. Umbrellar Powered by Pax8 is New Zealand's prime Professional and Managed Cloud Services specialist. Recently acquired by Pax8, we're transitioning into something "harder, better, faster, stronger" (thank you, Daft Punk!). Watch this space!
Network & SD WAN
A network service provider (NSP) is a business or organisation that sells bandwidth or network access by providing direct Internet backbone access to internet service providers and usually access to its network access points (NAPs). For such a reason, network service providers are sometimes referred to as backbone providers or internet providers. Network service providers may consist of telecommunications companies, data carriers, wireless communications providers, Internet service providers, and cable television operators offering high-speed Internet access.