Обсуждение: ERROR: too many dynamic shared memory segments
Hi,
I started seeing those errors on Postgresql 10.1 running on Ubuntu 16.04.3
(64GB/8CPU) server.
The DB enters recovery mode after that.
2017-11-23 07:20:39 CET::@:[24823]: ERROR: could not attach to dynamic
shared area
2017-11-23 07:20:39 CET::@:[24822]: ERROR: could not map dynamic shared
memory segment
2017-11-23 07:20:40 CET:192.168.xx,xx(33974):u(at)db:[24209]: ERROR: too
many
dynamic shared memory segments
The errors happen when the parallel execution is enabled and multiple
queries are executed simultaneously.
If I set the max_parallel_workers_per_gather = 0 the error doesn't occur.
The queries are rather big, each executes over 10sec.
I see the error when 4 or more queries are started at the same time.
my postgresql.conf:
max_connections = 100
shared_buffers = 12GB
effective_cache_size = 51GB
work_mem = 83886kB
maintenance_work_mem = 2GB
max_wal_size = 8GB
min_wal_size = 4GB
checkpoint_completion_target = 0.9
wal_buffers = 32MB
default_statistics_target = 1000
max_locks_per_transaction = 128
#max_parallel_workers_per_gather = 0
PS. I've posted this question on pgsql-admin distro but didn't get any
feedback.
--
regards,
Jakub Glapa
On Mon, Nov 27, 2017 at 10:54 PM, Jakub Glapa <jakub.glapa@gmail.com> wrote: > The DB enters recovery mode after that. That's not good. So it actually crashes? Can you please show the full error messages? > 2017-11-23 07:20:39 CET::@:[24823]: ERROR: could not attach to dynamic > shared area From src/backend/utils/mmgr/dsa.c. The same message is used for dsa_attach() and for attach_internal(), but in this case it must be the latter because we use in-place DSA areas for parallel query. This means that when the worker tried to attach it found that control->refcnt == 0, meaning 'too late, every other backend including the leader has already detached'. > 2017-11-23 07:20:39 CET::@:[24822]: ERROR: could not map dynamic shared > memory segment From src/backend/access/transam/parallel.c when dsm_attach returns null. I think this also means 'too late'. So those are both workers that have started up and found that the leader has abandoned the parallel query already, but they discovered it at different stages. PID 24823 didn't manage to attach to the DSM segment, while PID 24822 did but found that the other(s) had already detached from the per-query DSA area inside it. > 2017-11-23 07:20:40 CET:192.168.xx,xx(33974):u(at)db:[24209]: ERROR: too > many > dynamic shared memory segments > > The errors happen when the parallel execution is enabled and multiple > queries are executed simultaneously. > If I set the max_parallel_workers_per_gather = 0 the error doesn't occur. > The queries are rather big, each executes over 10sec. > I see the error when 4 or more queries are started at the same time. > > my postgresql.conf: > > max_connections = 100 So the two questions we need to figure out are: (1) how do we manage to use up all 64 + 2 * 100 DSM slots (or somehow corrupt things so it looks that way) by running only 4 queries, and (2) why would be be crashing rather than failing and reporting an error? I'm not able to reproduce the problem from your description running lots of parallel queries running at the same time. Can you provide reproduction steps?Does your query plan happen to include a Parallel BitmapHeap Scan? -- Thomas Munro http://www.enterprisedb.com
Hi Thomas,
log excerpt:
...
2017-11-27 12:21:14 CET:192.168.10.83(33424):user@db:[27291]: ERROR: too
many dynamic shared memory segments
2017-11-27 12:21:14 CET:192.168.10.83(33424):user@db:[27291]: STATEMENT:SELECT << REMOVED>>
2017-11-27 12:21:14 CET:192.168.10.83(35182):user@db:[28281]: ERROR: too
many dynamic shared memory segments
2017-11-27 12:21:14 CET:192.168.10.83(35182):user@db:[28281]: STATEMENT:SELECT <
2017-11-27 12:21:14 CET::@:[28618]: ERROR: could not map dynamic shared
memory segment
2017-11-27 12:21:14 CET::@:[28619]: ERROR: could not map dynamic shared
memory segment
2017-11-27 12:21:14 CET::@:[25645]: LOG: worker process: parallel worker
for PID 27291 (PID 28618) exited with exit code 1
2017-11-27 12:21:14 CET::@:[25645]: LOG: worker process: parallel worker
for PID 27291 (PID 28619) exited with exit code 1
...
this time the db didn't crash but the queries failed to execute.
The queries are somehow special.
We are still using the old style partitioning (list type) but we abuse it a
bit when querying.
When querying a set of partitions instead of doing it via parent table we
stitch together the required tables with UNION ALL (this requires less
locking) and was more performant in our benchmark (the new native
partitioning might improve this but we didn't research that yet).
The queries are in form of
SELECT col1,col2,col3 FROM
(SELECT * FROM par1 WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
'2017-11-26 23:59:59.999' UNION ALL SELECT * FROM par2 WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
'2017-11-26 23:59:59.999' UNION ALL SELECT * FROM par2 WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
'2017-11-26 23:59:59.999'
WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
'2017-11-26 23:59:59.999'
UNION ALL
...
) unionalias
WHERE ....
and indeed the query planner shows the usage of Bitmap Heap Scan
.... -> Subquery Scan on "*SELECT* 2"(cost=3068.58..19793.94 rows=1 width=1284) -> Gather (cost=3068.58..19793.93
rows=1 width=5014) Workers Planned: 2 -> Parallel Bitmap Heap Scan
on par_6 (cost=2068.58..18793.83 rows=1 width=5014) Recheck Cond:
<> Filter: <> -> BitmapAnd(cost=2068.58..2068.58 rows=30955 width=0) -> BitmapOr(cost=999.30..999.30 rows=42989 width=0) -> Bitmap
Index Scan on par_6_datasource (cost=0.00..990.21 rows=42922 width=0) Index
Cond: ((datasource)::text = 'one'::text) -> Bitmap
Index Scan on par_6_datasource (cost=0.00..4.30 rows=1 width=0) Index
Cond: ((datasource)::text = 'two'::text) -> Bitmap
Index Scan on par_6_datasource (cost=0.00..4.79 rows=67 width=0) Index
Cond: ((datasource)::text = 'three'::text) -> Bitmap Index
Scan on par_6_rangestart (cost=0.00..1069.02 rows=47564 width=0) Index Cond:
(rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone) -> Subquery Scan on "*SELECT* 3"(cost=761.33..7944.99 rows=1 width=1086) -> Bitmap Heap Scan on par_7(cost=761.33..7944.98 rows=1 width=4816) Recheck Cond:
<> Filter: <> -> BitmapAnd(cost=761.33..761.33 rows=7045 width=0) -> Bitmap Index Scan on
par_7_rangestart (cost=0.00..380.35 rows=14942 width=0) Index Cond:
(rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone) -> BitmapOr(cost=380.72..380.72 rows=12248 width=0) -> Bitmap Index
Scan on par_7_datasource (cost=0.00..372.00 rows=12228 width=0) Index Cond:
((datasource)::text = 'one'::text) -> Bitmap Index
Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0) Index Cond:
((datasource)::text = 'two'::text) -> Bitmap Index
Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0) Index Cond:
((datasource)::text = 'three'::text)
....
In this particular query there were over _100_ partitions connected with
the UNION ALL operator.
--
regards,
pozdrawiam,
Jakub Glapa
On Mon, Nov 27, 2017 at 11:47 AM, Thomas Munro <
thomas.munro@enterprisedb.com> wrote:
> On Mon, Nov 27, 2017 at 10:54 PM, Jakub Glapa
> wrote:
> > The DB enters recovery mode after that.
>
> That's not good. So it actually crashes? Can you please show the
> full error messages?
>
> > 2017-11-23 07:20:39 CET::@:[24823]: ERROR: could not attach to dynamic
> > shared area
>
> From src/backend/utils/mmgr/dsa.c. The same message is used for
> dsa_attach() and for attach_internal(), but in this case it must be
> the latter because we use in-place DSA areas for parallel query. This
> means that when the worker tried to attach it found that
> control->refcnt == 0, meaning 'too late, every other backend including
> the leader has already detached'.
>
> > 2017-11-23 07:20:39 CET::@:[24822]: ERROR: could not map dynamic shared
> > memory segment
>
> From src/backend/access/transam/parallel.c when dsm_attach returns
> null. I think this also means 'too late'.
>
> So those are both workers that have started up and found that the
> leader has abandoned the parallel query already, but they discovered
> it at different stages. PID 24823 didn't manage to attach to the DSM
> segment, while PID 24822 did but found that the other(s) had already
> detached from the per-query DSA area inside it.
>
> > 2017-11-23 07:20:40 CET:192.168.xx,xx(33974):u(at)db:[24209]: ERROR:
> too
> > many
> > dynamic shared memory segments
> >
> > The errors happen when the parallel execution is enabled and multiple
> > queries are executed simultaneously.
> > If I set the max_parallel_workers_per_gather = 0 the error doesn't
> occur.
> > The queries are rather big, each executes over 10sec.
> > I see the error when 4 or more queries are started at the same time.
> >
> > my postgresql.conf:
> >
> > max_connections = 100
>
> So the two questions we need to figure out are: (1) how do we manage
> to use up all 64 + 2 * 100 DSM slots (or somehow corrupt things so it
> looks that way) by running only 4 queries, and (2) why would be be
> crashing rather than failing and reporting an error? I'm not able to
> reproduce the problem from your description running lots of parallel
> queries running at the same time. Can you provide reproduction steps?
> Does your query plan happen to include a Parallel Bitmap Heap Scan?
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
On Tue, Nov 28, 2017 at 1:13 AM, Jakub Glapa <jakub.glapa@gmail.com> wrote: > The queries are somehow special. > We are still using the old style partitioning (list type) but we abuse it a > bit when querying. > When querying a set of partitions instead of doing it via parent table we > stitch together the required tables with UNION ALL (this requires less > locking) and was more performant in our benchmark (the new native > partitioning might improve this but we didn't research that yet). > > The queries are in form of > SELECT col1,col2,col3 FROM > (SELECT * > FROM par1 > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <= > '2017-11-26 23:59:59.999' > UNION ALL SELECT * > FROM par2 > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <= > '2017-11-26 23:59:59.999' > UNION ALL SELECT * > FROM par2 > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <= > '2017-11-26 23:59:59.999' > > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <= > '2017-11-26 23:59:59.999' > > UNION ALL > ... > ) unionalias > WHERE .... > > > > and indeed the query planner shows the usage of Bitmap Heap Scan > .... > -> Subquery Scan on "*SELECT* 2" > (cost=3068.58..19793.94 rows=1 width=1284) > -> Gather (cost=3068.58..19793.93 > rows=1 width=5014) > Workers Planned: 2 > -> Parallel Bitmap Heap Scan > on par_6 (cost=2068.58..18793.83 rows=1 width=5014) > Recheck Cond: > <<CONDITION>> > Filter: <<CONDITION>> > -> BitmapAnd > (cost=2068.58..2068.58 rows=30955 width=0) > -> BitmapOr > (cost=999.30..999.30 rows=42989 width=0) > -> Bitmap > Index Scan on par_6_datasource (cost=0.00..990.21 rows=42922 width=0) > Index > Cond: ((datasource)::text = 'one'::text) > -> Bitmap > Index Scan on par_6_datasource (cost=0.00..4.30 rows=1 width=0) > Index > Cond: ((datasource)::text = 'two'::text) > -> Bitmap > Index Scan on par_6_datasource (cost=0.00..4.79 rows=67 width=0) > Index > Cond: ((datasource)::text = 'three'::text) > -> Bitmap Index > Scan on par_6_rangestart (cost=0.00..1069.02 rows=47564 width=0) > Index Cond: > (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone) > -> Subquery Scan on "*SELECT* 3" > (cost=761.33..7944.99 rows=1 width=1086) > -> Bitmap Heap Scan on par_7 > (cost=761.33..7944.98 rows=1 width=4816) > Recheck Cond: > <<CONDITION>> > Filter: <<CONDITION>> > -> BitmapAnd > (cost=761.33..761.33 rows=7045 width=0) > -> Bitmap Index Scan on > par_7_rangestart (cost=0.00..380.35 rows=14942 width=0) > Index Cond: > (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone) > -> BitmapOr > (cost=380.72..380.72 rows=12248 width=0) > -> Bitmap Index > Scan on par_7_datasource (cost=0.00..372.00 rows=12228 width=0) > Index Cond: > ((datasource)::text = 'one'::text) > -> Bitmap Index > Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0) > Index Cond: > ((datasource)::text = 'two'::text) > -> Bitmap Index > Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0) > Index Cond: > ((datasource)::text = 'three'::text) > > .... > > > In this particular query there were over _100_ partitions connected with the > UNION ALL operator. Ah, so you have many Gather nodes under Append? That's one way to eat arbitrarily many DSM slots. We allow for 64 + 2 * max_backends. Does it help if you increase max_connections? I am concerned about the crash failure mode you mentioned in the first email though: we should always be able to handle that condition gracefully. -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Ah, so you have many Gather nodes under Append? That's one way to eat > arbitrarily many DSM slots. We allow for 64 + 2 * max_backends. Does > it help if you increase max_connections? I am concerned about the > crash failure mode you mentioned in the first email though: we should > always be able to handle that condition gracefully. I suspect this is an instance of the issue I complained about before [1] that parallel query is unable to cope with worker start failure. regards, tom lane [1] https://www.postgresql.org/message-id/4905.1492813727@sss.pgh.pa.us
Hi Thomas,
doubling the max_connection has the problem gone away for now! Yay!
As for the crash. I dug up the initial log and it looks like a segmentation
fault...
2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR: too
many dynamic shared memory segments
2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: STATEMENT:SELECT ....
2017-11-23 07:26:55 CET::@:[29398]: LOG: server process (PID 26992) was
terminated by signal 11: Segmentation fault
2017-11-23 07:26:55 CET::@:[29398]: DETAIL: Failed process was running:
SELECT .....
2017-11-23 07:26:55 CET::@:[29398]: LOG: terminating any other active
server processes
2017-11-23 07:26:55 CET:192.168.10.83(35242):user@db:[30009]: WARNING:terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35242):user@db:[30009]: DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(35242):user@db:[30009]: HINT: In a
moment you should be able to reconnect to the database and repeat your
command.
2017-11-23 07:26:55 CET:192.168.10.83(35268):user@db:[30097]: WARNING:terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35268):user@db:[30097]: DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(35268):user@db:[30097]: HINT: In a
moment you should be able to reconnect to the database and repeat your
command.
2017-11-23 07:26:55 CET:192.168.10.83(46614):user@db:[31080]: WARNING:terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(46614):user@db:[31080]: DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(46614):user@db: HINT: In a moment
you should be able to reconnect to the database and repeat your command.
2017-11-23 07:26:55 CET:192.168.10.83(35238):user@db:[30003]: WARNING:terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35238):user@db:[30003]: DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory
2017-11-23 07:26:55 CET:192.168.10.83(35300):user@db:[30152]: FATAL: the
database system is in recovery mode
--
regards,
pozdrawiam,
Jakub Glapa
On Mon, Nov 27, 2017 at 7:53 PM, Thomas Munro wrote:
> On Tue, Nov 28, 2017 at 1:13 AM, Jakub Glapa
> wrote:
> > The queries are somehow special.
> > We are still using the old style partitioning (list type) but we abuse
> it a
> > bit when querying.
> > When querying a set of partitions instead of doing it via parent table we
> > stitch together the required tables with UNION ALL (this requires less
> > locking) and was more performant in our benchmark (the new native
> > partitioning might improve this but we didn't research that yet).
> >
> > The queries are in form of
> > SELECT col1,col2,col3 FROM
> > (SELECT *
> > FROM par1
> > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
> > '2017-11-26 23:59:59.999'
> > UNION ALL SELECT *
> > FROM par2
> > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend
> <=
> > '2017-11-26 23:59:59.999'
> > UNION ALL SELECT *
> > FROM par2
> > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend
> <=
> > '2017-11-26 23:59:59.999'
> >
> > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
> > '2017-11-26 23:59:59.999'
> >
> > UNION ALL
> > ...
> > ) unionalias
> > WHERE ....
> >
> >
> >
> > and indeed the query planner shows the usage of Bitmap Heap Scan
> > ....
> > -> Subquery Scan on "*SELECT* 2"
> > (cost=3068.58..19793.94 rows=1 width=1284)
> > -> Gather
> (cost=3068.58..19793.93
> > rows=1 width=5014)
> > Workers Planned: 2
> > -> Parallel Bitmap Heap
> Scan
> > on par_6 (cost=2068.58..18793.83 rows=1 width=5014)
> > Recheck Cond:
> > <>
> > Filter: <>
> > -> BitmapAnd
> > (cost=2068.58..2068.58 rows=30955 width=0)
> > -> BitmapOr
> > (cost=999.30..999.30 rows=42989 width=0)
> > -> Bitmap
> > Index Scan on par_6_datasource (cost=0.00..990.21 rows=42922 width=0)
> >
> Index
> > Cond: ((datasource)::text = 'one'::text)
> > -> Bitmap
> > Index Scan on par_6_datasource (cost=0.00..4.30 rows=1 width=0)
> >
> Index
> > Cond: ((datasource)::text = 'two'::text)
> > -> Bitmap
> > Index Scan on par_6_datasource (cost=0.00..4.79 rows=67 width=0)
> >
> Index
> > Cond: ((datasource)::text = 'three'::text)
> > -> Bitmap Index
> > Scan on par_6_rangestart (cost=0.00..1069.02 rows=47564 width=0)
> > Index
> Cond:
> > (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone)
> > -> Subquery Scan on "*SELECT* 3"
> > (cost=761.33..7944.99 rows=1 width=1086)
> > -> Bitmap Heap Scan on par_7
> > (cost=761.33..7944.98 rows=1 width=4816)
> > Recheck Cond:
> > <>
> > Filter: <>
> > -> BitmapAnd
> > (cost=761.33..761.33 rows=7045 width=0)
> > -> Bitmap Index Scan
> on
> > par_7_rangestart (cost=0.00..380.35 rows=14942 width=0)
> > Index Cond:
> > (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone)
> > -> BitmapOr
> > (cost=380.72..380.72 rows=12248 width=0)
> > -> Bitmap Index
> > Scan on par_7_datasource (cost=0.00..372.00 rows=12228 width=0)
> > Index
> Cond:
> > ((datasource)::text = 'one'::text)
> > -> Bitmap Index
> > Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0)
> > Index
> Cond:
> > ((datasource)::text = 'two'::text)
> > -> Bitmap Index
> > Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0)
> > Index
> Cond:
> > ((datasource)::text = 'three'::text)
> >
> > ....
> >
> >
> > In this particular query there were over _100_ partitions connected with
> the
> > UNION ALL operator.
>
> Ah, so you have many Gather nodes under Append? That's one way to eat
> arbitrarily many DSM slots. We allow for 64 + 2 * max_backends. Does
> it help if you increase max_connections? I am concerned about the
> crash failure mode you mentioned in the first email though: we should
> always be able to handle that condition gracefully.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa <jakub.glapa@gmail.com> wrote: > As for the crash. I dug up the initial log and it looks like a segmentation > fault... > > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR: too > many dynamic shared memory segments Hmm. Well this error can only occur in dsm_create() called without DSM_CREATE_NULL_IF_MAXSEGMENTS. parallel.c calls it with that flag and dsa.c doesn't (perhaps it should, not sure, but that'd just change the error message), so that means this the error arose from dsa.c trying to get more segments. That would be when Parallel Bitmap Heap Scan tried to allocate memory. I hacked my copy of PostgreSQL so that it allows only 5 DSM slots and managed to reproduce a segv crash by trying to run concurrent Parallel Bitmap Heap Scans. The stack looks like this: * frame #0: 0x00000001083ace29 postgres`alloc_object(area=0x0000000000000000, size_class=10) + 25 at dsa.c:1433 frame #1: 0x00000001083acd14 postgres`dsa_allocate_extended(area=0x0000000000000000, size=72, flags=4) + 1076 at dsa.c:785 frame #2: 0x0000000108059c33 postgres`tbm_prepare_shared_iterate(tbm=0x00007f9743027660) + 67 at tidbitmap.c:780 frame #3: 0x0000000108000d57 postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at nodeBitmapHeapscan.c:156 frame #4: 0x0000000107fefc5b postgres`ExecScanFetch(node=0x00007f9743019c88, accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77), recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) + 459 at execScan.c:95 frame #5: 0x0000000107fef983 postgres`ExecScan(node=0x00007f9743019c88, accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77), recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) + 147 at execScan.c:162 frame #6: 0x00000001080008d1 postgres`ExecBitmapHeapScan(pstate=0x00007f9743019c88) + 49 at nodeBitmapHeapscan.c:735 (lldb) f 3 frame #3: 0x0000000108000d57 postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at nodeBitmapHeapscan.c:156 153 * dsa_pointer of the iterator state which will be used by 154 * multiple processes to iteratejointly. 155 */ -> 156 pstate->tbmiterator = tbm_prepare_shared_iterate(tbm); 157 #ifdef USE_PREFETCH 158 if (node->prefetch_maximum >0) 159 (lldb) print tbm->dsa (dsa_area *) $3 = 0x0000000000000000 (lldb) print node->ss.ps.state->es_query_dsa (dsa_area *) $5 = 0x0000000000000000 (lldb) f 17 frame #17: 0x000000010800363b postgres`ExecGather(pstate=0x00007f9743019320) + 635 at nodeGather.c:220 217 * Get next tuple, either from one of our workers, or by running the plan 218 * ourselves. 219 */ -> 220 slot = gather_getnext(node); 221 if (TupIsNull(slot)) 222 return NULL; 223 (lldb) print *node->pei (ParallelExecutorInfo) $8 = { planstate = 0x00007f9743019640 pcxt = 0x00007f97450001b8 buffer_usage = 0x0000000108b7e218instrumentation = 0x0000000108b7da38 area = 0x0000000000000000 param_exec = 0 finished = '\0' tqueue =0x0000000000000000 reader = 0x0000000000000000 } (lldb) print *node->pei->pcxt warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available. (ParallelContext) $9 = { node = { prev = 0x000000010855fb60 next = 0x000000010855fb60 } subid = 1 nworkers = 0 nworkers_launched= 0 library_name = 0x00007f9745000248 "postgres" function_name = 0x00007f9745000268 "ParallelQueryMain"error_context_stack = 0x0000000000000000 estimator = (space_for_chunks = 180352, number_of_keys = 19)seg = 0x0000000000000000 private_memory = 0x0000000108b53038 toc = 0x0000000108b53038 worker = 0x0000000000000000 } I think there are two failure modes: one of your sessions showed the "too many ..." error (that's good, ran out of slots and said so and our error machinery worked as it should), and another crashed with a segfault, because it tried to use a NULL "area" pointer (bad). I think this is a degenerate case where we completely failed to launch parallel query, but we ran the parallel query plan anyway and this code thinks that the DSA is available. Oops. -- Thomas Munro http://www.enterprisedb.com
On Tue, Nov 28, 2017 at 4:18 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa <jakub.glapa@gmail.com> wrote:
> As for the crash. I dug up the initial log and it looks like a segmentation
> fault...
>
> 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR: too
> many dynamic shared memory segments
I think there are two failure modes: one of your sessions showed the
"too many ..." error (that's good, ran out of slots and said so and
our error machinery worked as it should), and another crashed with a
segfault, because it tried to use a NULL "area" pointer (bad). I
think this is a degenerate case where we completely failed to launch
parallel query, but we ran the parallel query plan anyway and this
code thinks that the DSA is available. Oops.
I think BitmapHeapScan check whether dsa is valid or not if DSA is not valid then it should assume it's non-parallel plan.
Attached patch should fix the issue.
Вложения
On Tue, Nov 28, 2017 at 4:18 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa <jakub.glapa@gmail.com> wrote:
> As for the crash. I dug up the initial log and it looks like a segmentation
> fault...
>
> 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR: too
> many dynamic shared memory segments
I think there are two failure modes: one of your sessions showed the
"too many ..." error (that's good, ran out of slots and said so and
our error machinery worked as it should), and another crashed with a
segfault, because it tried to use a NULL "area" pointer (bad). I
think this is a degenerate case where we completely failed to launch
parallel query, but we ran the parallel query plan anyway and this
code thinks that the DSA is available. Oops.
I think BitmapHeapScan check whether dsa is valid or not if DSA is not valid then it should assume it's non-parallel plan.
Attached patch should fix the issue.
Вложения
On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > I think BitmapHeapScan check whether dsa is valid or not if DSA is not > valid then it should assume it's non-parallel plan. > > Attached patch should fix the issue. So, create the pstate and then pretend we didn't? Why not just avoid creating it in the first place, like this? I haven't checked whether this fixes the bug, but if it does, we can avoid introducing an extra branch in BitmapHeapNext. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > I think BitmapHeapScan check whether dsa is valid or not if DSA is not > valid then it should assume it's non-parallel plan. > > Attached patch should fix the issue. So, create the pstate and then pretend we didn't? Why not just avoid creating it in the first place, like this? I haven't checked whether this fixes the bug, but if it does, we can avoid introducing an extra branch in BitmapHeapNext. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Tue, Nov 28, 2017 at 7:13 PM, Robert Haas wrote:
> On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar
> wrote:
> > I think BitmapHeapScan check whether dsa is valid or not if DSA is not
> > valid then it should assume it's non-parallel plan.
> >
> > Attached patch should fix the issue.
>
> So, create the pstate and then pretend we didn't? Why not just avoid
> creating it in the first place, like this?
>
This is better way to fix it.
>
> I haven't checked whether this fixes the bug, but if it does, we can
> avoid introducing an extra branch in BitmapHeapNext.
With my test it's fixing the problem.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Nov 28, 2017 at 7:13 PM, Robert Haas wrote:
> On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar
> wrote:
> > I think BitmapHeapScan check whether dsa is valid or not if DSA is not
> > valid then it should assume it's non-parallel plan.
> >
> > Attached patch should fix the issue.
>
> So, create the pstate and then pretend we didn't? Why not just avoid
> creating it in the first place, like this?
>
This is better way to fix it.
>
> I haven't checked whether this fixes the bug, but if it does, we can
> avoid introducing an extra branch in BitmapHeapNext.
With my test it's fixing the problem.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Nov 28, 2017 at 9:45 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: >> I haven't checked whether this fixes the bug, but if it does, we can >> avoid introducing an extra branch in BitmapHeapNext. > > With my test it's fixing the problem. I tested it some more and found that, for me, it PARTIALLY fixes the problem. I tested like this: --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -279,7 +279,7 @@ InitializeParallelDSM(ParallelContext *pcxt) * parallelism than to fail outright. */ segsize= shm_toc_estimate(&pcxt->estimator); - if (pcxt->nworkers > 0) + if (pcxt->nworkers > 0 && false) pcxt->seg = dsm_create(segsize, DSM_CREATE_NULL_IF_MAXSEGMENTS); if (pcxt->seg!= NULL) pcxt->toc = shm_toc_create(PARALLEL_MAGIC, That turned out to produce more than one problem. I find that the select_parallel test then fails like this: ERROR: could not find key 18446744073709486082 in shm TOC at 0x10be98040 The fix for that problem seems to be: --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -430,7 +430,8 @@ ReinitializeParallelDSM(ParallelContext *pcxt) /* Recreate error queues. */ error_queue_space = - shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, false); + shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, true); + Assert(pcxt->nworkers == 0 || error_queue_space != NULL); for (i = 0; i < pcxt->nworkers; ++i) { char *start; With that fix in place, I then hit a crash in parallel bitmap heap scan. After applying no-pstate.patch, which I just committed and back-patched to v10, then things look OK. I'm going to apply the fix for the error_queue_space problem also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 28, 2017 at 9:45 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: >> I haven't checked whether this fixes the bug, but if it does, we can >> avoid introducing an extra branch in BitmapHeapNext. > > With my test it's fixing the problem. I tested it some more and found that, for me, it PARTIALLY fixes the problem. I tested like this: --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -279,7 +279,7 @@ InitializeParallelDSM(ParallelContext *pcxt) * parallelism than to fail outright. */ segsize= shm_toc_estimate(&pcxt->estimator); - if (pcxt->nworkers > 0) + if (pcxt->nworkers > 0 && false) pcxt->seg = dsm_create(segsize, DSM_CREATE_NULL_IF_MAXSEGMENTS); if (pcxt->seg!= NULL) pcxt->toc = shm_toc_create(PARALLEL_MAGIC, That turned out to produce more than one problem. I find that the select_parallel test then fails like this: ERROR: could not find key 18446744073709486082 in shm TOC at 0x10be98040 The fix for that problem seems to be: --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -430,7 +430,8 @@ ReinitializeParallelDSM(ParallelContext *pcxt) /* Recreate error queues. */ error_queue_space = - shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, false); + shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, true); + Assert(pcxt->nworkers == 0 || error_queue_space != NULL); for (i = 0; i < pcxt->nworkers; ++i) { char *start; With that fix in place, I then hit a crash in parallel bitmap heap scan. After applying no-pstate.patch, which I just committed and back-patched to v10, then things look OK. I'm going to apply the fix for the error_queue_space problem also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I see that the segfault is under active discussion but just wanted to ask
if increasing the max_connections to mitigate the DSM slots shortage is the
way to go?
--
regards,
Jakub Glapa
On Mon, Nov 27, 2017 at 11:48 PM, Thomas Munro <
thomas.munro@enterprisedb.com> wrote:
> On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa
> wrote:
> > As for the crash. I dug up the initial log and it looks like a
> segmentation
> > fault...
> >
> > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR:
> too
> > many dynamic shared memory segments
>
> Hmm. Well this error can only occur in dsm_create() called without
> DSM_CREATE_NULL_IF_MAXSEGMENTS. parallel.c calls it with that flag
> and dsa.c doesn't (perhaps it should, not sure, but that'd just change
> the error message), so that means this the error arose from dsa.c
> trying to get more segments. That would be when Parallel Bitmap Heap
> Scan tried to allocate memory.
>
> I hacked my copy of PostgreSQL so that it allows only 5 DSM slots and
> managed to reproduce a segv crash by trying to run concurrent Parallel
> Bitmap Heap Scans. The stack looks like this:
>
> * frame #0: 0x00000001083ace29
> postgres`alloc_object(area=0x0000000000000000, size_class=10) + 25 at
> dsa.c:1433
> frame #1: 0x00000001083acd14
> postgres`dsa_allocate_extended(area=0x0000000000000000, size=72,
> flags=4) + 1076 at dsa.c:785
> frame #2: 0x0000000108059c33
> postgres`tbm_prepare_shared_iterate(tbm=0x00007f9743027660) + 67 at
> tidbitmap.c:780
> frame #3: 0x0000000108000d57
> postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at
> nodeBitmapHeapscan.c:156
> frame #4: 0x0000000107fefc5b
> postgres`ExecScanFetch(node=0x00007f9743019c88,
> accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77),
> recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) +
> 459 at execScan.c:95
> frame #5: 0x0000000107fef983
> postgres`ExecScan(node=0x00007f9743019c88,
> accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77),
> recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) +
> 147 at execScan.c:162
> frame #6: 0x00000001080008d1
> postgres`ExecBitmapHeapScan(pstate=0x00007f9743019c88) + 49 at
> nodeBitmapHeapscan.c:735
>
> (lldb) f 3
> frame #3: 0x0000000108000d57
> postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at
> nodeBitmapHeapscan.c:156
> 153 * dsa_pointer of the iterator state which will be used by
> 154 * multiple processes to iterate jointly.
> 155 */
> -> 156 pstate->tbmiterator = tbm_prepare_shared_iterate(tbm);
> 157 #ifdef USE_PREFETCH
> 158 if (node->prefetch_maximum > 0)
> 159
> (lldb) print tbm->dsa
> (dsa_area *) $3 = 0x0000000000000000
> (lldb) print node->ss.ps.state->es_query_dsa
> (dsa_area *) $5 = 0x0000000000000000
> (lldb) f 17
> frame #17: 0x000000010800363b
> postgres`ExecGather(pstate=0x00007f9743019320) + 635 at
> nodeGather.c:220
> 217 * Get next tuple, either from one of our workers, or by running the
> plan
> 218 * ourselves.
> 219 */
> -> 220 slot = gather_getnext(node);
> 221 if (TupIsNull(slot))
> 222 return NULL;
> 223
> (lldb) print *node->pei
> (ParallelExecutorInfo) $8 = {
> planstate = 0x00007f9743019640
> pcxt = 0x00007f97450001b8
> buffer_usage = 0x0000000108b7e218
> instrumentation = 0x0000000108b7da38
> area = 0x0000000000000000
> param_exec = 0
> finished = '\0'
> tqueue = 0x0000000000000000
> reader = 0x0000000000000000
> }
> (lldb) print *node->pei->pcxt
> warning: could not load any Objective-C class information. This will
> significantly reduce the quality of type information available.
> (ParallelContext) $9 = {
> node = {
> prev = 0x000000010855fb60
> next = 0x000000010855fb60
> }
> subid = 1
> nworkers = 0
> nworkers_launched = 0
> library_name = 0x00007f9745000248 "postgres"
> function_name = 0x00007f9745000268 "ParallelQueryMain"
> error_context_stack = 0x0000000000000000
> estimator = (space_for_chunks = 180352, number_of_keys = 19)
> seg = 0x0000000000000000
> private_memory = 0x0000000108b53038
> toc = 0x0000000108b53038
> worker = 0x0000000000000000
> }
>
> I think there are two failure modes: one of your sessions showed the
> "too many ..." error (that's good, ran out of slots and said so and
> our error machinery worked as it should), and another crashed with a
> segfault, because it tried to use a NULL "area" pointer (bad). I
> think this is a degenerate case where we completely failed to launch
> parallel query, but we ran the parallel query plan anyway and this
> code thinks that the DSA is available. Oops.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
On Tue, Dec 5, 2017 at 1:18 AM, Jakub Glapa <jakub.glapa@gmail.com> wrote: > I see that the segfault is under active discussion but just wanted to ask if > increasing the max_connections to mitigate the DSM slots shortage is the way > to go? Hi Jakub, Yes. In future releases this situation will improve (maybe we'll figure out how to use one DSM segment for all the gather nodes in your query plan, and maybe it'll be moot anyway because maybe we'll be able to use a Parallel Append for queries like yours so that it uses the same set of workers over all the child plans instead of the fork()-fest you're presumably seeing). For now your only choice, if you want that plan to run, is to crank up max_connections so that the total number of concurrently executing Gather nodes is less than about 64 + 2 * max_connections. There is also a crash bug right now in the out-of-slots case as discussed, fixed in the next point release, but even with that fix in place you'll still need a high enough max_connections setting to be sure to be able to complete the query without an error. Thanks for the report! -- Thomas Munro http://www.enterprisedb.com
Thank You Thomas!
--
regards,
Jakub Glapa
--
regards,
Jakub Glapa
On Thu, Dec 7, 2017 at 10:30 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Dec 5, 2017 at 1:18 AM, Jakub Glapa <jakub.glapa@gmail.com> wrote:
> I see that the segfault is under active discussion but just wanted to ask if
> increasing the max_connections to mitigate the DSM slots shortage is the way
> to go?
Hi Jakub,
Yes. In future releases this situation will improve (maybe we'll
figure out how to use one DSM segment for all the gather nodes in your
query plan, and maybe it'll be moot anyway because maybe we'll be able
to use a Parallel Append for queries like yours so that it uses the
same set of workers over all the child plans instead of the
fork()-fest you're presumably seeing). For now your only choice, if
you want that plan to run, is to crank up max_connections so that the
total number of concurrently executing Gather nodes is less than about
64 + 2 * max_connections. There is also a crash bug right now in the
out-of-slots case as discussed, fixed in the next point release, but
even with that fix in place you'll still need a high enough
max_connections setting to be sure to be able to complete the query
without an error.
Thanks for the report!