Amazon has issued a detailed post mortem of the firm’s Amazon Web Services down time, apologising for the outage and offering a sevice credit to affected customers.
The summary of the service distribution runs to several thousand words and is thick with technical detail, explaining that the disruption began following a network configuration change which incorrectly shifted traffic to the wrong network router.
Amazon went on to document progressive problems as the cloud service began a ‘re-mirroring storm’ with the requests quickly overloading capacity that increased ‘error rates and latencies’ across the entire region. Major websites like Quora, Foursquare and Reddit were taken down by the disruption.
Amazon said that in future it would "audit the change process" relating to network configuration changes and would also invest in enhancements for the recovery of data in the event of such problems.
"Last, but certainly not least, we want to apologize. We know how critical our services are to our customers’ businesses and we will do everything we can to learn from this event and use it to drive improvement across our services," the AWS team wrote.