Voice Routing api unavailable
Incident Report for Sound of Data
Postmortem

Summary:

During a new deployment of our voice routing API a configuration error was introduced which caused all of the instances responsible for hosting the voice routing API to refresh at the same time. This resulted in the voice routing API to become unavailable while new instances where still in the provisioning phase and older instanced had already been decommissioned.

Corrective measures:

When our alarms indicated that the routing API became unstable we immediately started our incident response process with senior devops and engineering teams. The response is setup in three phases, identify, correct and monitor.

Since we could identify the problem quickly, we intervened in the auto scaling process and stopped any decommissioning of services. We then corrected the deployment and redeployed the software. this corrected the problem and the new API became available again.

After that we did 2 more deployment runs to make sure there that the configuration was indeed correct for any future upgrades.

Future preventative measures:

We’ve adjusted our auto deployment software to check for certain conditions which could trigger this behavior and will block any roll out of new versions.

If you have any additional questions, please contact our customerservice team @ customerservice@soundofdata.nl

Thomas Hazelaar
CTO
Sound of Data

Posted Apr 22, 2021 - 09:56 CEST

Resolved
All services remain stable and no further api errors have been seen. We consider the routing API to be 100% stable again. Post mortum write up will follow once we have the full outage report from the engineers.

For more information please contact our customerservice team @ customerservice@soundofdata.nl
Posted Apr 13, 2021 - 22:22 CEST
Monitoring
We have deployed a fix and the voice routing API is back online. We are seeing healthy application status again. We will keep monitoring for the next hours to make sure the application remains stable and will follow up with a full post mortem once we have identified the root cause and have identified the steps to prevent such an occurrence.

For more information please contact our customerservice team @ customerservice@soundofdata.nl
Posted Apr 13, 2021 - 14:45 CEST
Identified
We are experiencing an issue with the availability of our voice routing engine which handles the routing of voice calls.
Our engineers have identified the problem and are working on a fix.

Next update in 30 minutes or sooner if new information becomes available.

For more information please contact our customerservice team @ customerservice@soundofdata.nl
Posted Apr 13, 2021 - 14:35 CEST
This incident affected: ITSP | Services (Voice Services).