Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.

Поиск
Список
Период
Сортировка
От Chris Hanks
Тема Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.
Дата
Msg-id CAK7KUdAzTD53BDL89CesrCYc2hj5+mKmRLyB4yrxxhRPUNu0QQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.
Список pgsql-bugs
I'm not sure if you guys had noticed this already, but just in case,
those two hung connections weren't making the inserts at exactly the
same time - if you look at pg_stat_activity they executed about a day
apart (since Heroku cycles the clients every 24 hours or so). And
before I restarted the first DB that experienced the problem, there
were ~30 backends built up over the course of a month. It seems like
when one INSERT sticks, every following INSERT just stacks up on top
of it, trying to take out the same lock.

This isn't to say that a race condition didn't cause the issue in the
first place (it may have, given how difficult it is to reproduce). But
it might have been a read using the index or something like that. From
my knowledge of the usage patterns for the app, I believe that only a
single INSERT should have been taking place at that time - the INSERT
only happens when a worker process starts up, there was only one
worker process for the app, and the worker processes were only
restarted every 24 hours. Also I'd expect this table to be empty when
the INSERT took place (it's an unlogged table that tracks which
connections belong to clients that are ready to pull from a queue).

Also, the only time the index would be used, to my knowledge, is in a
trigger on another table when a row is inserted. It tries to
pseudorandomly pick a client that's ready for work:

```   SELECT pid   INTO locker_pid   FROM (     SELECT *, last_value(row_number) OVER () + 1 AS count     FROM (
SELECT*, row_number() OVER () - 1 AS row_number       FROM (         SELECT *         FROM public.que_lockers ql,
generate_series(1,ql.worker_count) AS id         WHERE listening AND queues @> ARRAY[NEW.queue]         ORDER BY
md5(pid::text|| id::text)       ) t1     ) t2   ) t3   WHERE NEW.id % count = row_number;
 
```

Chris

On Mon, Oct 23, 2017 at 12:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, Oct 23, 2017 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This is probably enough info for us to identify the exact cause.  But
>> I don't have time to look right this minute, and am not the hacker most
>> familiar with the GIN infrastructure anyway.  Anyone want to look for
>> the bug?
>
> I don't know very much about GIN, but this does look interesting. I'll
> try to get to it later in the week; I have meetings over the next
> couple of days.
>
> --
> Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [BUGS] BUG #14866: The generated constraint in the typed tablecauses the server to crash