Yes, we are now in the process of adding custom metrics/alerts around the xmin horizon across all of our postgres databases.
We will do a DB-wide VACUUM FULL as well (ironically, this incident started because VACUUM FULL failed last weekend).
Appreciate all the input on this.
Arjun
On Thu, Nov 2, 2017 at 11:06 AM, Stephen Frost <sfrost@snowman.net> wrote:
Tom, Arjun, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Arjun Ranade <ranade@nodalexchange.com> writes: > > After dropping the replication slot, VACUUM FULL runs fine now and no > > longer reports the "oldest xmin is far in the past" > > Excellent. Maybe we should think about providing better tools to notice > "stuck" replication slots.
+1 > In the meantime, you probably realize this already, but if global xmin > has been stuck for months then you're going to have terrible bloat > everywhere. Database-wide VACUUM FULL seems called for.
This, really, is also a lesson in "monitor your distance to transaction wrap-around".. You really should know something is up a lot sooner than the warnings from PG showing up in the logs.