The Catastrophic Facebook Outage

On 5 October 2021, “Sorry, something went wrong,” greeted every screen requesting access to Facebook. This significant outage hit Facebook rendering it inaccessible for almost a day and made numerous Facebook-owned outlets like Messenger, Instagram, and WhatsApp unavailable too. While the disconnection did not affect certain geographic areas, the services globally were down.

Massive Failure

Theories about what went wrong emerged sporadically. People were unsure of what had happened. People wondered if it was an external hack, a malicious inside job, or a catastrophic mistake. Well, it was the latter!

In a statement Facebook released, the platform elaborated that configuration changes on the backbone routers coordinate network traffic between the data centers caused an issue which interrupted communication. The disruption to network traffic affected the communication between centers, bringing services to a stop.

Fortunately, Facebook fixed the mistake with employees physically accessing the data centers to conduct specific instructions from senior engineers.

Understanding the outage

The outage’s cause was a routine border gateway protocol (BGP) configuration that had gone wrong. The BGP directs how packets forward to multiple network sites and locations. The BGP router contains routing tables that find the best available routes to connect to the destination. Every request is managed from each router to prevent internet traffic. The BGP tells a device which path to take through various servers to its final destination.

With the BGP configuration gone wrong, Facebook completely lost its BGP systems and the capacity to redirect devices to reach their destination which caused a global outage.

Outage Implications

With the pandemic, businesses more than ever need to make use of platforms not only for marketing but also as a critical piece of communication infrastructure because most interactions and payments are now online.

The outage highlighted how easy it is for these systems to succumb to technical failure. The world hangs in the balance through the meticulous maintenance of servers because technical malfunctions can lead to severe consequences for a business. Facebook lost billions of dollars due to the outage.

Takeaway Points

  • A catastrophic failure in the BGP update caused the unavailability of Facebook and Facebook-owned platforms for almost a day.
  • The disconnection caused global ramifications.
  • Safeguards and contingencies are needed to prevent future outages.

