Error: An invalid API response was returned

I'm also currently blocked with developing due to this issue, but thanks for taking care of Prismic team! :smiley:

2 Likes

We have identified the incident likely to be caused by something outside of the Prismic domain. We are in contact with our service provider and actively investigating this issue with them.

8 Likes

Thanks Phil for keeping us updated. Hope it will get resolved soon as our dev is blocked at the moment :smiling_face_with_tear:

1 Like

Luckily our website is cached with ISR + SSG with NextJS in Vercel so it hasn't caused any impact yet. I really hope you find a solution as soon as possible. Best of luck to the Prismic team #HugOps

P.s. please keep us updated more frequently if anything changes.

2 Likes

We have identified a network issue related a service managed by our infrastructure provider.
We are in contact with our service provider who are observing and investigating a drop in connections to our storage system.
In parallel, we are actively working on a backup plan to bypass this issue by utilizing different infrastructure.

4 Likes

Why is the front-facing marketing website down? Isn't the website built statically?

The outage was due to a change that one of our Service Providers (AWS) released to our infrastructure.
We have now worked around this unplanned change. The API is now getting back to a normal state, but it will take a few minutes before the service is fully up again.
We are ensuring that further changes to our infrastructure by our providers won't affect the reliability of our API.

Thank you again for your patience and I felt the 'dev' love for our pain :heart:

9 Likes

Kris!

Absolutely, it's mostly static apart from some dynamic things for blog (comments on Supabase for example).

We hit the issue when we published new content, and therefore automatically called revalidateTag() via the webhook to revalidate all Prismic requests. While I think Next.js should then use the working version from the fetch cache if the revalidation didn't work, it didn't and our pages started to return errors.

Then you would think that doing a rollback on a confirmed (checked the deploy link) working deploy on Vercel should work, but as soon as it was promoted it stopped working. Tested multiple old deploys as well.

The only thing I can think of being the issue for us is the edge runtime og-image generation, which relies on the Prismic API to generate og-images on the fly.

1 Like

@samuelhorn Thanks for the explanation, Sam! I actually experienced the exact same issue with one of my clients and I was quick to tell the rest of my clients not to publish any content on Prismic in order not to trigger the revalidate endpoint.

This is a good opportunity to consider if revalidate is even good in this scenario. I expected that if Prismic errors out, the revalidate will not trigger, thus the page stays at the latest working session. I'd be interested to hear your thoughts after you investigate it on your end

Glad that it is resolved now!

2 Likes

Thank you!

Final update

Following our recent update on the service interruption, we want to provide you with more details about the incident and our ongoing efforts to ensure system stability.

Incident Overview:
As we previously communicated, the outage was a result of a change implemented by our service provider, AWS. Specifically, our Lambda serverless infrastructure was automatically upgraded to a new Node.js version. This unexpected upgrade introduced a breaking change in the way HTTP headers were managed, leading to a series of errors.

Impact and Response:
The change initiated a snowball effect, generating errors that impacted our services. Our response was to roll back our lambda serverless infrastructure to the previous stable version. This action successfully resolved the issue, restoring normal service operations.

Current Status and Monitoring:
We have returned to full operational status and are carefully monitoring the recovery of our infrastructure. Our team is actively working to ensure that all systems remain stable and performant.

Future Safeguards:
To prevent a recurrence of this issue, we are taking corrective measures, including:

  • We will review our change management protocols with our service providers.
  • We already implemented stricter controls for automatic updates.

We understand the critical importance of our services to your operations and are deeply committed to maintaining the highest standards of reliability and performance. We apologize for the disruption caused and appreciate your understanding and continued trust in Prismic.

If you have any specific question about your repo, please reach out privately in the support portal :pray:

1 Like