Issue with SSL certificates
Incident Report for Metabase Cloud
Postmortem

Summary

On Thursday Nov 9th the default certificate for the external Nginx ingress was replaced with a dummy certificate. Users attempting to reach their hosted instances would get an invalid certificate error. This was due to a misconfiguration in the new CI for releasing our helm charts that used the internal Nginx ingress values files for the external Nginx ingress.

Impact

All hosted customers not using custom domains were affected.

How was the root cause diagnosed?

We identified that the values were properly set in the values files. We Investigated the CI steps to determine why the correct values were not set. It was determined that the wrong values files were being used for external Nginx ingress.

How we’ll make this not happen again?

  • Update staging and production value files to match.
  • Add alert to pagerduty for non-custom domains (catch alerts faster)
Posted Nov 15, 2023 - 13:21 UTC

Resolved
Ingress controller certificates are now back to normal. You should be able to get the correct certificate in the browser now. We're very sorry for the inconvenience, we'll publish a retrospective about the issue in the next few days.
Posted Nov 09, 2023 - 20:02 UTC
Investigating
We're investigating an issue in our cloud with certificate management that's causing browsers to receive incorrect certificates
Posted Nov 09, 2023 - 19:37 UTC
This incident affected: Metabase Cloud Platform.