Copyright Newsweek

In the early hours of Monday morning, a large part of the internet suddenly stopped working. The origins of the global outage which affected millions around the world —an area of roughly 385 acres in Northern Virginia. When Amazon Web Services (AWS) was hit with the catastrophic outage, it brought down major digital services, apps, and websites including Snapchat, Roblox, Fortnite, Ring, Coinbase, Venmo, Robinhood, Hinge, and Reddit. The cloud computing platform said that the core issue was an “operational” incident in its US-EAST-1 region, a major hub for its cloud services and a site previously involved in past outages. Technological analysts and AWS’s own messages pointed to failures in Domain Name System (DNS) resolution and networking systems within the database infrastructure as likely contributors. AWS has since said that all of its services have now "returned to normal operations" although it warned that some, such as Config, Redshift, and Connect, would continue to work through a backlog of messages for a few hours on Monday afternoon. The outage underscored a central trade-off of cloud computing: while it lets businesses deploy global services without maintaining vast infrastructure, it concentrates risk. A problem in a single region—like Northern Virginia—can cause widespread, simultaneous outages for unrelated companies worldwide. Virginia has the most data centers in the United States (663), according to infrastructure watchdog DataCenterMap.There are currently 4,005 active data centers registered in the U.S. - Newsweek has mapped them out here. The Internet Trade-off Companies often choose to use services like the AWS as opposed to building their own in-house computing systems, which would be far more complex and expensive. As Hisham Ibrahim, Chief Community Officer at RIPE NCC, Europe’s regional internet registry, explained to Newsweek: “Companies providing online services will often decide to use large cloud platforms rather than building and maintaining their own global Internet infrastructure.” “This allows them to deliver services worldwide without having to operate all that complexity themselves,” he added. “The trade-off is increased centralization: when a major cloud provider or region experiences a problem, it can take a wide range of unrelated services offline at once.” Amazon is the leading provider of this service covering more than 41 percent of the market, followed by Google and then Microsoft, according to market research group Gartner. Amazon’s Northern Virginia hub is the oldest and biggest in the country, Doug Madory, director of internet analysis at Kentik told the Associated Press. “For a lot of people, if you’re going to use AWS, you’re going to use US-East-1 regardless of where you are on Planet Earth,” Madory said. “We have this incredible concentration of IT services that are hosted out of one region by one cloud provider, for the world, and that presents a fragility for modern society and the modern economy.” Vulnerability Monday’s incident exposed the vulnerability in how depended the internet is on these concentrated hubs. “Outages like this can ripple outward in a number of ways,” Ibrahim said. “Some services are hosted entirely within the affected cloud environment and so they simply become unavailable during an outage.” “Other services might rely on critical components such as databases, authentication systems, or content delivery tools that happen to run there,” he continued. “When those can’t be reached, performance degrades or the wider service fails.” Ibrahim also explained how it could have been worse – “These issues can also cascade, as systems automatically shift traffic elsewhere and create unexpected load or other effects,” he said. But Ibrahim stressed that, despite the significant services affected by one outage, the scale of the internet is much bigger. “While Monday’s incident highlights what can happen when many online services are concentrated within relatively few large cloud platforms, the Internet itself – the global network of 76,000 interconnected networks that all of this runs on – was unaffected and continued to work as intended,” he said. “So while the people running some of these services might have had a bad day today, the Internet itself was doing just fine.”