Обсуждение: LISTEN fails to "access status of transaction"

Поиск
Список
Период
Сортировка

LISTEN fails to "access status of transaction"

От
Sean Rhea
Дата:
Hello,

We think we're running into a bug in the pg_notify code.

We've only seen this bug twice. We can't reproduce it at will, but once it
starts happening it's 100% reproducible until we implement the workaround
as described below. Hopefully the information here is enough for you to
work with, and if not, we understand.

The symptom we see is that any client attempting to call "LISTEN <channel
name>;" receives an error like this one:

  ERROR:  could not access status of transaction 3767760004
  DETAIL:  Could not open file "pg_clog/0E09": No such file or directory.

However, at the time this occurs, the clog seemed to have advanced well
beyond that log location:

  root@machine:/var/lib/postgresql/9.2/main/pg_clog$ ls 0E* | sort | head -5
  0E1C
  0E1D
  0E1E
  0E1F
  0E20

When this is happening, only LISTEN calls fail; NOTIFY works fine, and
clients performing SELECTs, UPDATEs, etc., don't encounter any problems.
Moreover, during the problem the pg_notify directory has quite a large
number of files.

Our fix is to stop all client processes that had performed a LISTEN on any
channel. Once we do this, Postgres clears out the contents of the pg_notify
directory. After that, all subsequent LISTEN call succeed.

We're running Debian Squeeze on Intel hardware.

production=> select version();
                                        version

---------------------------------------------------------------------------------------
 PostgreSQL 9.2.6 on i686-pc-linux-gnu, compiled by gcc (Debian 4.4.5-8)
4.4.5, 32-bit
(1 row)

Please let me know if there's any additional information that would help
your debugging.

Sean

Re: LISTEN fails to "access status of transaction"

От
Tom Lane
Дата:
Sean Rhea <srhea@cisco.com> writes:
> We think we're running into a bug in the pg_notify code.

> We've only seen this bug twice. We can't reproduce it at will, but once it
> starts happening it's 100% reproducible until we implement the workaround
> as described below. Hopefully the information here is enough for you to
> work with, and if not, we understand.

> The symptom we see is that any client attempting to call "LISTEN <channel
> name>;" receives an error like this one:

>   ERROR:  could not access status of transaction 3767760004
>   DETAIL:  Could not open file "pg_clog/0E09": No such file or directory.

Hm.  This appears to indicate that there's a sending-transaction XID in
the notify queue that's so old that the corresponding clog entry has
been recycled.  I don't think there's any direct interlock between the
notify queue contents and the clog recycling mechanisms; but we don't
recycle clog until a VACUUM FREEZE has frozen all older tuples, and
normally that's hundreds of millions of transactions in the past.
So I wonder if you (a) are aggressively forcing freezing, and/or
(b) have some listening session that has been idle-in-transaction
for, um, a very long time.  Even if you do, it's not quite clear how
we could have advanced the freeze horizon far enough to allow this
problem to occur: I'd have thought that such an open transaction would
also block freezing.  But there's clearly *something* you're doing
that's outside the norm.

            regards, tom lane