Incident Timeline:
Incident Root Causes:
Incident Impact:
For the period that the Engine Yard Cloud Dashboard was offline no customers were able to view or manage their environments through either the Dashboard or the API, so were unable to make environment or application changes. Running instances outside of the subset of failed instances in the single AZ of US-East-1 were unaffected, so the majority of customer applications were not impacted. For those with instances in the affected AZ, application impact was dependent on the role of the impacted instances, with database and application master instances resulting in application downtime, whilst slaves instances most likely not. EY Support staff worked with customers to restore failed instances where practically possible within the limitations of the AWS issues.
Incident Corrective Actions:
Engine Yard will be working to strengthen the platform environments in order to ensure the highest resilience across all components in order to minimise the disruption from any future infrastructure failures.