Обсуждение: pgsql: Handle lack of DSM slots in parallel btree build.
Handle lack of DSM slots in parallel btree build. If no DSM slots are available, a ParallelContext can still be created, but its seg pointer is NULL. Teach parallel btree build to cope with that by falling back to a regular non-parallel build, to avoid crashing with a segmentation fault. Back-patch to 11, where parallel CREATE INDEX landed. Reported-by: Nicola Contu Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CA%2BhUKGJgJEBnkuODBVomyK3MWFvDBbMVj%3Dgdt6DnRPU-5sQ6UQ%40mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/74618e77b43cfce670b4725d5b9a300a2afd12d1 Modified Files -------------- src/backend/access/nbtree/nbtsort.c | 9 +++++++++ 1 file changed, 9 insertions(+)
On Thu, Jan 30, 2020 at 2:34 PM Thomas Munro <tmunro@postgresql.org> wrote: > Handle lack of DSM slots in parallel btree build. > > If no DSM slots are available, a ParallelContext can still be > created, but its seg pointer is NULL. Teach parallel btree build > to cope with that by falling back to a regular non-parallel build, > to avoid crashing with a segmentation fault. Uh, this seems to have completely disabled parallel index builds on the master branch. -- Peter Geoghegan
Hi Peter, On Tue, Feb 4, 2020 at 9:13 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Jan 30, 2020 at 2:34 PM Thomas Munro <tmunro@postgresql.org> wrote: > > Handle lack of DSM slots in parallel btree build. > > > > If no DSM slots are available, a ParallelContext can still be > > created, but its seg pointer is NULL. Teach parallel btree build > > to cope with that by falling back to a regular non-parallel build, > > to avoid crashing with a segmentation fault. > > Uh, this seems to have completely disabled parallel index builds on > the master branch. Oops. The check needs to move down below InitializeParallelDSM(), and release any extra resources that might have been acquired (snapshot?). I will do some testing in a few hours and post a fix.
On Tue, Feb 4, 2020 at 10:03 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, Feb 4, 2020 at 9:13 AM Peter Geoghegan <pg@bowt.ie> wrote: > > Uh, this seems to have completely disabled parallel index builds on > > the master branch. > > Oops. The check needs to move down below InitializeParallelDSM(), and > release any extra resources that might have been acquired (snapshot?). > I will do some testing in a few hours and post a fix. Here is take 2. I tested CI and CIC, and verified that workers are started or not depending on dsm_create() failure, using the attached fault injector patch. (Fuzzing the regression tests repeatedly with that applied also revealed similar problems elsewhere in the tree, which I'll write about in a new thread.)
Вложения
On Tue, Feb 4, 2020 at 2:18 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Here is take 2. I tested CI and CIC, and verified that workers are > started or not depending on dsm_create() failure, using the attached > fault injector patch. (Fuzzing the regression tests repeatedly with > that applied also revealed similar problems elsewhere in the tree, > which I'll write about in a new thread.) Can we reuse _bt_end_parallel() this time around? It would be easy to add a "bool wait" argument to _bt_end_parallel(). All existing callers would pass true, but when not using parallelism due to the new DSM segment check we'd pass false. -- Peter Geoghegan
On Tue, Feb 4, 2020 at 9:42 AM Peter Geoghegan <pg@bowt.ie> wrote: > Can we reuse _bt_end_parallel() this time around? It would be easy to > add a "bool wait" argument to _bt_end_parallel(). All existing callers > would pass true, but when not using parallelism due to the new DSM > segment check we'd pass false. Hmm. I see why you didn't do it that way -- we don't quite have the variables set up in the way that _bt_end_parallel() expects them. This looks good, then. -- Peter Geoghegan
On Wed, Feb 5, 2020 at 6:47 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Feb 4, 2020 at 9:42 AM Peter Geoghegan <pg@bowt.ie> wrote: > > Can we reuse _bt_end_parallel() this time around? It would be easy to > > add a "bool wait" argument to _bt_end_parallel(). All existing callers > > would pass true, but when not using parallelism due to the new DSM > > segment check we'd pass false. > > Hmm. I see why you didn't do it that way -- we don't quite have the > variables set up in the way that _bt_end_parallel() expects them. > > This looks good, then. Thanks. Pushed.
On Tue, Feb 4, 2020 at 3:38 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Thanks. Pushed. Thanks. -- Peter Geoghegan