PostgreSQL production incident caused by transaction ID wraparound

rastignack · 2026-04-18T21:57:37 1776549457

Just monitor it and you’re done. I’ve delivered and maintained hundreds of pg instances and never faced this issue. There is so much literature about it that at some point no one even slightly skilled will face it.

johnbarron · 2026-04-18T22:49:45 1776552585

>> Just monitor it and you’re done.

This is just anecdote, colliding with documented database behavior, who is not an issue on Oracle, SQL Server, or IBM DB2.

PostgreSQL explicitly documents xid wraparound as a failure mode that can lead to catastrophic data loss and says vacuuming is required to prevent it. Near exhaustion, it will refuse commands.

Small sample of known outages:

- Sentry — Transaction ID Wraparound in Postgres

https://blog.sentry.io/transaction-id-wraparound-in-postgres...

Mailchimp / Mandrill — What We Learned from the Recent Mandrill Outage

https://mailchimp.com/what-we-learned-from-the-recent-mandri...

Joyent / Manta — Challenges deploying PostgreSQL (9.2) for high availability

https://www.davepacheco.net/blog/2024/challenges-deploying-p...

BattleMetrics — March 27, 2022 Postgres Transacton ID Wraparound

https://learn.battlemetrics.com/article/64-march-27-2022-pos...

Duffel — concurrency control & vacuuming in PostgreSQL

https://duffel.com/blog/understanding-outage-concurrency-vac...

Figma — Postmortem: Service disruption on January 21–22, 2020

https://www.figma.com/blog/post-mortem-service-disruption-on...

Even AWS updated their recommendation as recently as Feb 2025, and is an issue in Aurora Postgres as well as Postgres.

"Prevent transaction ID wraparound by using postgres_get_av_diag() for monitoring autovacuum" https://aws.amazon.com/blogs/database/prevent-transaction-id...

buggymcbugfix · 2026-04-19T19:03:31 1776625411

Getting AI vibes from this article? It is strangely repetitive and meandering. Also tell-tale "It's not X, it's Y" and sort of unspecific mostly.

Also, why would you have billions of open transactions? That is the implication I got from the article as someone who doesn't know anything about Postgres.

(I use SQLite and perhaps I have Stockholm syndrome, but I like how it pushes you towards a design with small transactions, ideally entirely database-side.)

EDIT: Yeah, gptzero says AI with 100% confidence.

tcp_handshaker · 2026-04-19T22:09:03 1776636543

Congrats on blaming the messenger, and not bothering to understand the issue.

plasticeagle · 2026-04-18T21:56:54 1776549414

AI;DR

Which is why it's

TL;DR

Boring shit article about obvious problem.

fmajid · 2026-04-18T22:08:15 1776550095

It's not as obvious as you think, GitLab was hit by this a few years ago. But yes, low-quality article and the SQL Server plug is in poor taste.

ozten · 2026-04-18T23:22:01 1776554521

Many SEV-1s are “obvious”. Still feels like a kick in the stomach if your the one that was response LOLz.

fmajid · 2026-04-19T16:12:16 1776615136

I meant "obvious to anyone putting PostgreSQL in production that they have to put specific monitoring in place for this, and palliative measures"

The database shutting itself down and refusing to come back up until a full vacuum or vacuum freeze is performed, which means days of downtime, yes, that's pretty obvious indeed.

jffry · 2026-04-18T21:27:53 1776547673

tl;dr: autovacuum was seen to be active during an earlier incident, assumed to be at fault, and was disabled. It was never re-enabled. The long-term implications of disabling autovacuum were not actively considered.

throwatdem12311 · 2026-04-18T22:03:00 1776549780

TL;DR Don’t turn off auto vacuum and periodically tweak your write heavy tables so they are vacuumed regularly enough so this never happens.

fallpeak · 2026-04-18T21:40:35 1776548435

TL;DR: Devs didn't know what they were doing and turned off autovacuum and eventually it broke, then the author decided to have an AI slop out an article about the incident which may or may not have actually occurred.

thesh4d0w · 2026-04-18T21:56:03 1776549363

Don't forget to include some slop about why SQL Server is better.