Re: a raft of parallelism-related bug fixes

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: a raft of parallelism-related bug fixes
Дата
Msg-id CA+TgmoYd53hV7U1-qSTmaQ1UEmk+j_0EaGaQ1iMPLxZd_esxDQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: a raft of parallelism-related bug fixes  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: a raft of parallelism-related bug fixes  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Mon, Oct 19, 2015 at 12:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Oct 17, 2015 at 9:16 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no
>> special buildfarm support is required - you would just add that to the
>> animal's config file, more or less like this:
>>
>>      config_env =>
>>      {
>>          CPPFLAGS => '-DGRATUITOUSLY_PARALLEL',
>>      },
>>
>> I try to make things easy :-)
>
> Wow, that's great.  So, I'll try to rework the test code I posted
> previously into something less hacky, and eventually add a #define
> like this so we can run it on the buildfarm.  There's a few other
> things that need to get done before that really makes sense - like
> getting the rest of the bug fix patches committed - otherwise any
> buildfarm critters we add will just be permanently red.

OK, so after a bit more delay than I would have liked, I now have a
working set of patches that we can use to ensure automated testing of
the parallel mode infrastructure.  I ended up doing something that
does not require a #define, so I'll need some guidance on what to do
on the BF side given that context.  Please find attached three
patches, two of them for commit.

group-locking-v1.patch is a vastly improved version of the group
locking patch that we discussed, uh, extensively last year.  I realize
that there was a lot of doubt about this approach, but I still believe
it's the right approach, I have put a lot of work into making it work
correctly, I don't think anyone has come up with a really plausible
alternative approach (except one other approach I tried which turned
out to work but with significantly more restrictions), and I'm
committed to fixing it in whatever way is necessary if it turns out to
be broken, even if that amounts to a full rewrite.  Review is welcome,
but I honestly believe it's a good idea to get this into the tree
sooner rather than later at this point, because automated regression
testing falls to pieces without these changes, and I believe that
automated regression testing is a really good idea to shake out
whatever bugs we may have in the parallel query stuff.  The code in
this patch is all mine, but Amit Kapila deserves credit as co-author
for doing a lot of prototyping (that ended up getting tossed) and
testing.  This patch includes comments and an addition to
src/backend/storage/lmgr/README which explain in more detail what this
patch does, how it does it, and why that's OK.

force-parallel-mode-v1.patch is what adds the actual infrastructure
for automated testing.  You can set force_parallel_mode=on to force
queries to be ru in a worker whenever possible; this can help test
whether your user-defined functions have been erroneously labeled as
PARALLEL SAFE.  If they error out or misbehave with this setting
enabled, you should label them PARALLEL RESTRICTED or PARALLEL UNSAFE.
If you set force_parallel_mode=regress, then some additional changes
intended specifically for regression testing kick in; those changes
are intended to ensure that you get exactly the same output from
running the regression tests with the parallelism infrastructure
forcibly enabled that you would have gotten anyway.  Most of this code
is mine, but there are also contributions from Amit Kapila and Rushabh
Lathia.

With both of these patches, you can create a file that says:

force_parallel_mode=regress
max_parallel_degree=2

Then you can run: make check-world TEMP_CONFIG=/path/to/aforementioned/file

If you do, you'll find that while the core regression tests pass
(whee!) the pg_upgrade regression tests fail (oops) because of a
pre-existing bug in the parallelism code introduced by neither of
these two patches.  I'm not exactly sure how to fix that bug yet - I
have a couple of ideas - but I think the fact that this test code
promptly found a bug is good sign that it provides enough test
coverage to be useful.  Sticking a Gather node on top of every query
where it looks safe just turns out to exercise a lot of things: the
code that decides whether it's safe to put that Gather node, the code
to launch and manage parallel workers, the code those workers
themselves run, etc.  The point is just to force as much of the
parallel code to be used as possible even when it's not expected to
make anything faster.

test-group-locking-v1.patch is useful for testing possible deadlock
scenarios with the group locking patch.  It's not otherwise safe to
use this, like, at all, and the patch is not proposed for commit.
This patch is entirely by Amit Kapila.

In addition to what's in these patches, I'd like to add a new chapter
to the documentation explaining which queries can be parallelized and
in what ways, what the restrictions are that keep parallel query from
getting used, and some high-level details of how parallelism "works"
in PostgreSQL from a user perspective.  Things will obviously change
here as we get more capabilities, but I think we're at a point where
it makes sense to start putting this together.  What I'm less clear
about is where exactly in the current SGML documentation such a new
chapter might fit; suggestions very welcome.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Ununsed member in printQueryOpt
Следующее
От: Michael Banck
Дата:
Сообщение: Re: PostgreSQL Auditing