[BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.

Поиск
Список
Период
Сортировка
От Chris Hanks
Тема [BUGS] Connections hang indefinitely while taking a LWLockTranchebuffer_content lock.
Дата
Msg-id CAK7KUdBjF4TN5Z2Rf0fv1MVUa=uqQssKMhoTXUa2y88R4PPjsg@mail.gmail.com
обсуждение исходный текст
Ответы Re: [BUGS] Connections hang indefinitely while taking a LWLockTranche buffer_content lock.  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hi there -

I'm encountering a locking issue in Postgres. I unfortunately haven't
been able to make a reproduction for it, but it's occurred a few times
on different machines, and I have access to two machines that are
currently locked up, so I'm hoping that someone can help anyway.

Postgres version: "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled
by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit"

The issue is that, during an INSERT to an unlogged table, the
inserting connection gets frozen and never exits. When I query
pg_stat_activity I see that it is holding open a "LWLockTranche"
"buffer_content" lock. These connections can't be killed by
pg_terminate_backend() or pg_cancel_backend() (those functions return
true but the entries in pg_stat_activity remain), and they persist
long after the client application has exited - there were some I found
on a staging server that were a month old, and were only resolved by
restarting the Postgres instance. Their transactions are held open
also.

Example contents of pg_stat_activity:

```
SELECT pid, wait_event_type, wait_event, state, backend_xid,
backend_xmin, backend_start, xact_start, query_start, state_change,
query FROM pg_stat_activity ORDER BY xact_start
```
```
6840;"LWLockTranche";"buffer_content";"active";11092;11092;"2017-10-18
14:53:01.224955+00";"2017-10-18 14:53:01.243026+00";"2017-10-18
14:53:01.243148+00";"2017-10-18 14:53:01.243149+00";"INSERT INTO
public.que_lockers ( pid, worker_count, worker_priorities, ruby_pid,
ruby_hostname, listening, queues ) VALUES ( pg_backend_pid(),
$1::integer, $2::integer[], $3::integer, $4::text, $5::boolean,
$6::text[] )"
30522;"LWLockTranche";"buffer_content";"active";11093;11092;"2017-10-19
15:21:24.692055+00";"2017-10-19 15:21:24.720159+00";"2017-10-19
15:21:24.720272+00";"2017-10-19 15:21:24.720273+00";"INSERT INTO
public.que_lockers ( pid, worker_count, worker_priorities, ruby_pid,
ruby_hostname, listening, queues ) VALUES ( pg_backend_pid(),
$1::integer, $2::integer[], $3::integer, $4::text, $5::boolean,
$6::text[] )"
628;"LWLockTranche";"buffer_content";"active";11094;11092;"2017-10-19
18:00:51.299233+00";"2017-10-19 18:00:51.329217+00";"2017-10-19
18:00:51.329333+00";"2017-10-19 18:00:51.329334+00";"INSERT INTO
public.que_lockers ( pid, worker_count, worker_priorities, ruby_pid,
ruby_hostname, listening, queues ) VALUES ( pg_backend_pid(),
$1::integer, $2::integer[], $3::integer, $4::text, $5::boolean,
$6::text[] )"
23871;"LWLockTranche";"buffer_content";"active";11095;11092;"2017-10-20
18:08:01.292776+00";"2017-10-20 18:08:01.32107+00";"2017-10-20
18:08:01.32119+00";"2017-10-20 18:08:01.321191+00";"INSERT INTO
public.que_lockers ( pid, worker_count, worker_priorities, ruby_pid,
ruby_hostname, listening, queues ) VALUES ( pg_backend_pid(),
$1::integer, $2::integer[], $3::integer, $4::text, $5::boolean,
$6::text[] )"
25825;"";"";"active";;11092;"2017-10-20
20:10:26.865863+00";"2017-10-20 20:15:33.682626+00";"2017-10-20
20:15:33.682626+00";"2017-10-20 20:15:33.682629+00";"SELECT pid,
wait_event_type, wait_event, state, backend_xid, backend_xmin,
backend_start, xact_start, query_start, state_change, query FROM
pg_stat_activity ORDER BY xact_start"
25823;"";"";"idle";;;"2017-10-20 20:10:17.70007+00";"";"2017-10-20
20:10:26.041268+00";"2017-10-20 20:10:26.042296+00";"SELECT 1 FROM
pg_available_extensions WHERE name='adminpack'"
23868;"";"";"idle";;;"2017-10-20 18:07:57.609124+00";"";"2017-10-20
18:08:01.241658+00";"2017-10-20 18:08:01.241927+00";"SELECT "id",
"pg_subdomain", "custom_domain" FROM "public"."companies""
```

The table schema looks like this (copy-pasted from PgAdmin 3):

```
-- Table: public.que_lockers

-- DROP TABLE public.que_lockers;

CREATE UNLOGGED TABLE public.que_lockers
( pid integer NOT NULL, worker_count integer NOT NULL, worker_priorities integer[] NOT NULL, ruby_pid integer NOT NULL,
ruby_hostnametext NOT NULL, queues text[] NOT NULL, listening boolean NOT NULL, CONSTRAINT que_lockers_pkey PRIMARY KEY
(pid),CONSTRAINT valid_queues CHECK (array_ndims(queues) = 1 AND
 
array_length(queues, 1) IS NOT NULL), CONSTRAINT valid_worker_priorities CHECK
(array_ndims(worker_priorities) = 1 AND
array_length(worker_priorities, 1) IS NOT NULL)
)
WITH ( OIDS=FALSE
);
ALTER TABLE public.que_lockers OWNER TO u4eo9shu4pp52d;

-- Index: public.que_lockers_listening_queues_idx

-- DROP INDEX public.que_lockers_listening_queues_idx;

CREATE INDEX que_lockers_listening_queues_idx ON public.que_lockers USING gin (queues COLLATE pg_catalog."default")
WHERElistening;
 
```

The inserted values don't appear in the pg_stat_activity output
(beyond pid being pg_backend_pid()), but they should be:
worker_count = 6
worker_priorities = '{10,30,50,NULL,NULL,NULL}'
ruby_pid = (the pid of the client process)
ruby_hostname = (the hostname of the client process, in this case a UUID)
listening = true
queues = '{default}'

The following setup was done for each connection after they were established:
```
SET standard_conforming_strings = ON
SET client_min_messages = 'WARNING'
SET search_path = "public"
SET DateStyle = 'ISO'
SET idle_in_transaction_session_timeout TO '60s'
SET SESSION TIME ZONE 'UTC'
```

Unfortunately the databases I've been able to reproduce this on are
hosted by Heroku, so I don't have direct access to the machines for
debugging purposes. Someone in the #postgresql IRC channel suggested
checking what the backend processes was doing with strace - I asked
Heroku Support to check this and they responded that the processes
were waiting on `semop` calls, but that according to `ipcs -s` there
aren't any currently active. If anyone has any more requests for
information to be retrieved from a machine with a stuck process like
this I'm happy to relay them.

I hope this is enough to go on. If anyone would like any more
information please let me know and I'll do my best to investigate.

Chris


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [BUGS] BUG #14849: jsonb_build_object doesn't like VARIADIC callsvery much
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] Connections hang indefinitely while taking a LWLockTranche buffer_content lock.