[BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

Поиск
Список
Период
Сортировка
От marko@joh.to
Тема [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24
Дата
Msg-id 20170926182935.14128.65278@wrigleys.postgresql.org
обсуждение исходный текст
Ответы Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24  (Marko Tiikkaja <marko@joh.to>)
Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      14830
Logged by:          Marko Tiikkaja
Email address:      marko@joh.to
PostgreSQL version: Unsupported/Unknown
Operating system:   Ubuntu 14.04
Description:

Hey,

I understand this is not much information to go on (but the problem is
extremely difficult to reproduce), and that 9.1 is technically out of
support (but I don't think the relevant code has changed significantly,
either), so I fully expect that nobody will be able to figure out what's
wrong based on that.  But I thought I'd post anyway.

For the past two days I've been tracking down a bug where it would appear
that some NOTIFications are simply lost.  Then a minute later when the
notification is resent by a different transaction, it comes through just
fine.  We have a single program connected to the database all the time,
which LISTENs on around 800 channels and delivers the notifications to its
own clients.  The problem seems to only start happening, or perhaps gets
worse the longer this application is connected to the database.

I'm attaching two excerpts from the strace which, if I'm reading this
correctly, would suggest that there's a bug in postgres here.  Here's how I
read this:
 1) In strace2.txt, the send on line #1 corresponds to 28:3486 in
strace1.txt.  I know this because notification payloads on that channel are
unique. 2) In strace2.txt, on line #5 something slightly out of the ordinary
happens.  We have around 75 semop calls compared to 5400 semop calls in the
full strace, so no biggie, but perhaps noteworthy.  Contention with another
backend, perhaps. 3) The send on line #6 seems to correspond to 28:3600 in strace1.txt. 4) Then here's where the
problemseems to occur: the next send, on line 
25, corresponds to 28:4458 in strace1.txt.

Within that ~850 bytes that the sending backend seemingly jumped over, we
have multiple notifications on channels we know the backend was listening
on.  That's including a notification on channel "workerid48101842", which is
the one our application was desperately missing in this case.  PostgreSQL's
logs and the state of the database indicate that at least the transaction
which wrote the "workerid48101842" notification committed, and I have no
reason to believe that any of the other ones near it did not commit.

So.. any ideas?  Unfortunately I can't reproduce this in an isolated
environment, and in production this seems to be taking some time before it
builds up into a proper issue.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [BUGS] BUG #14785: Logical replication does not work afteradding a column. Bug?
Следующее
От: Marko Tiikkaja
Дата:
Сообщение: Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24