Re: ERROR: too many dynamic shared memory segments

Поиск
Список
Период
Сортировка
От Jakub Glapa
Тема Re: ERROR: too many dynamic shared memory segments
Дата
Msg-id CAJk1zg2XqYa1Dx14z9X3qkUgWH6kpoNbvSGT_=NtzDPnMhfFHg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: ERROR: too many dynamic shared memory segments  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: ERROR: too many dynamic shared memory segments
Список pgsql-general
Hi Thomas, log excerpt: ... 2017-11-27 12:21:14 CET:192.168.10.83(33424):user@db:[27291]: ERROR: too many dynamic shared memory segments 2017-11-27 12:21:14 CET:192.168.10.83(33424):user@db:[27291]: STATEMENT:SELECT << REMOVED>> 2017-11-27 12:21:14 CET:192.168.10.83(35182):user@db:[28281]: ERROR: too many dynamic shared memory segments 2017-11-27 12:21:14 CET:192.168.10.83(35182):user@db:[28281]: STATEMENT:SELECT < 2017-11-27 12:21:14 CET::@:[28618]: ERROR: could not map dynamic shared memory segment 2017-11-27 12:21:14 CET::@:[28619]: ERROR: could not map dynamic shared memory segment 2017-11-27 12:21:14 CET::@:[25645]: LOG: worker process: parallel worker for PID 27291 (PID 28618) exited with exit code 1 2017-11-27 12:21:14 CET::@:[25645]: LOG: worker process: parallel worker for PID 27291 (PID 28619) exited with exit code 1 ... this time the db didn't crash but the queries failed to execute. The queries are somehow special. We are still using the old style partitioning (list type) but we abuse it a bit when querying. When querying a set of partitions instead of doing it via parent table we stitch together the required tables with UNION ALL (this requires less locking) and was more performant in our benchmark (the new native partitioning might improve this but we didn't research that yet). The queries are in form of SELECT col1,col2,col3 FROM (SELECT * FROM par1 WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <= '2017-11-26 23:59:59.999' UNION ALL SELECT * FROM par2 WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <= '2017-11-26 23:59:59.999' UNION ALL SELECT * FROM par2 WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <= '2017-11-26 23:59:59.999' WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <= '2017-11-26 23:59:59.999' UNION ALL ... ) unionalias WHERE .... and indeed the query planner shows the usage of Bitmap Heap Scan .... -> Subquery Scan on "*SELECT* 2"(cost=3068.58..19793.94 rows=1 width=1284) -> Gather (cost=3068.58..19793.93 rows=1 width=5014) Workers Planned: 2 -> Parallel Bitmap Heap Scan on par_6 (cost=2068.58..18793.83 rows=1 width=5014) Recheck Cond: <> Filter: <> -> BitmapAnd(cost=2068.58..2068.58 rows=30955 width=0) -> BitmapOr(cost=999.30..999.30 rows=42989 width=0) -> Bitmap Index Scan on par_6_datasource (cost=0.00..990.21 rows=42922 width=0) Index Cond: ((datasource)::text = 'one'::text) -> Bitmap Index Scan on par_6_datasource (cost=0.00..4.30 rows=1 width=0) Index Cond: ((datasource)::text = 'two'::text) -> Bitmap Index Scan on par_6_datasource (cost=0.00..4.79 rows=67 width=0) Index Cond: ((datasource)::text = 'three'::text) -> Bitmap Index Scan on par_6_rangestart (cost=0.00..1069.02 rows=47564 width=0) Index Cond: (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone) -> Subquery Scan on "*SELECT* 3"(cost=761.33..7944.99 rows=1 width=1086) -> Bitmap Heap Scan on par_7(cost=761.33..7944.98 rows=1 width=4816) Recheck Cond: <> Filter: <> -> BitmapAnd(cost=761.33..761.33 rows=7045 width=0) -> Bitmap Index Scan on par_7_rangestart (cost=0.00..380.35 rows=14942 width=0) Index Cond: (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone) -> BitmapOr(cost=380.72..380.72 rows=12248 width=0) -> Bitmap Index Scan on par_7_datasource (cost=0.00..372.00 rows=12228 width=0) Index Cond: ((datasource)::text = 'one'::text) -> Bitmap Index Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0) Index Cond: ((datasource)::text = 'two'::text) -> Bitmap Index Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0) Index Cond: ((datasource)::text = 'three'::text) .... In this particular query there were over _100_ partitions connected with the UNION ALL operator. -- regards, pozdrawiam, Jakub Glapa On Mon, Nov 27, 2017 at 11:47 AM, Thomas Munro < thomas.munro@enterprisedb.com> wrote: > On Mon, Nov 27, 2017 at 10:54 PM, Jakub Glapa > wrote: > > The DB enters recovery mode after that. > > That's not good. So it actually crashes? Can you please show the > full error messages? > > > 2017-11-23 07:20:39 CET::@:[24823]: ERROR: could not attach to dynamic > > shared area > > From src/backend/utils/mmgr/dsa.c. The same message is used for > dsa_attach() and for attach_internal(), but in this case it must be > the latter because we use in-place DSA areas for parallel query. This > means that when the worker tried to attach it found that > control->refcnt == 0, meaning 'too late, every other backend including > the leader has already detached'. > > > 2017-11-23 07:20:39 CET::@:[24822]: ERROR: could not map dynamic shared > > memory segment > > From src/backend/access/transam/parallel.c when dsm_attach returns > null. I think this also means 'too late'. > > So those are both workers that have started up and found that the > leader has abandoned the parallel query already, but they discovered > it at different stages. PID 24823 didn't manage to attach to the DSM > segment, while PID 24822 did but found that the other(s) had already > detached from the per-query DSA area inside it. > > > 2017-11-23 07:20:40 CET:192.168.xx,xx(33974):u(at)db:[24209]: ERROR: > too > > many > > dynamic shared memory segments > > > > The errors happen when the parallel execution is enabled and multiple > > queries are executed simultaneously. > > If I set the max_parallel_workers_per_gather = 0 the error doesn't > occur. > > The queries are rather big, each executes over 10sec. > > I see the error when 4 or more queries are started at the same time. > > > > my postgresql.conf: > > > > max_connections = 100 > > So the two questions we need to figure out are: (1) how do we manage > to use up all 64 + 2 * 100 DSM slots (or somehow corrupt things so it > looks that way) by running only 4 queries, and (2) why would be be > crashing rather than failing and reporting an error? I'm not able to > reproduce the problem from your description running lots of parallel > queries running at the same time. Can you provide reproduction steps? > Does your query plan happen to include a Parallel Bitmap Heap Scan? > > -- > Thomas Munro > http://www.enterprisedb.com >

В списке pgsql-general по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: ERROR: too many dynamic shared memory segments
Следующее
От: PeterS
Дата:
Сообщение: Install numpy to use within plpython3u in postgres 9.6 for Windows2012 (64 bit)