Sean Rhea <srhea@cisco.com> writes:
> We think we're running into a bug in the pg_notify code.
> We've only seen this bug twice. We can't reproduce it at will, but once it
> starts happening it's 100% reproducible until we implement the workaround
> as described below. Hopefully the information here is enough for you to
> work with, and if not, we understand.
> The symptom we see is that any client attempting to call "LISTEN <channel
> name>;" receives an error like this one:
> ERROR: could not access status of transaction 3767760004
> DETAIL: Could not open file "pg_clog/0E09": No such file or directory.
Hm. This appears to indicate that there's a sending-transaction XID in
the notify queue that's so old that the corresponding clog entry has
been recycled. I don't think there's any direct interlock between the
notify queue contents and the clog recycling mechanisms; but we don't
recycle clog until a VACUUM FREEZE has frozen all older tuples, and
normally that's hundreds of millions of transactions in the past.
So I wonder if you (a) are aggressively forcing freezing, and/or
(b) have some listening session that has been idle-in-transaction
for, um, a very long time. Even if you do, it's not quite clear how
we could have advanced the freeze horizon far enough to allow this
problem to occur: I'd have thought that such an open transaction would
also block freezing. But there's clearly *something* you're doing
that's outside the norm.
regards, tom lane