Re: a raft of parallelism-related bug fixes
От | Robert Haas |
---|---|
Тема | Re: a raft of parallelism-related bug fixes |
Дата | |
Msg-id | CA+TgmoYd53hV7U1-qSTmaQ1UEmk+j_0EaGaQ1iMPLxZd_esxDQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: a raft of parallelism-related bug fixes (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: a raft of parallelism-related bug fixes
(Andres Freund <andres@anarazel.de>)
|
Список | pgsql-hackers |
On Mon, Oct 19, 2015 at 12:02 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Oct 17, 2015 at 9:16 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no >> special buildfarm support is required - you would just add that to the >> animal's config file, more or less like this: >> >> config_env => >> { >> CPPFLAGS => '-DGRATUITOUSLY_PARALLEL', >> }, >> >> I try to make things easy :-) > > Wow, that's great. So, I'll try to rework the test code I posted > previously into something less hacky, and eventually add a #define > like this so we can run it on the buildfarm. There's a few other > things that need to get done before that really makes sense - like > getting the rest of the bug fix patches committed - otherwise any > buildfarm critters we add will just be permanently red. OK, so after a bit more delay than I would have liked, I now have a working set of patches that we can use to ensure automated testing of the parallel mode infrastructure. I ended up doing something that does not require a #define, so I'll need some guidance on what to do on the BF side given that context. Please find attached three patches, two of them for commit. group-locking-v1.patch is a vastly improved version of the group locking patch that we discussed, uh, extensively last year. I realize that there was a lot of doubt about this approach, but I still believe it's the right approach, I have put a lot of work into making it work correctly, I don't think anyone has come up with a really plausible alternative approach (except one other approach I tried which turned out to work but with significantly more restrictions), and I'm committed to fixing it in whatever way is necessary if it turns out to be broken, even if that amounts to a full rewrite. Review is welcome, but I honestly believe it's a good idea to get this into the tree sooner rather than later at this point, because automated regression testing falls to pieces without these changes, and I believe that automated regression testing is a really good idea to shake out whatever bugs we may have in the parallel query stuff. The code in this patch is all mine, but Amit Kapila deserves credit as co-author for doing a lot of prototyping (that ended up getting tossed) and testing. This patch includes comments and an addition to src/backend/storage/lmgr/README which explain in more detail what this patch does, how it does it, and why that's OK. force-parallel-mode-v1.patch is what adds the actual infrastructure for automated testing. You can set force_parallel_mode=on to force queries to be ru in a worker whenever possible; this can help test whether your user-defined functions have been erroneously labeled as PARALLEL SAFE. If they error out or misbehave with this setting enabled, you should label them PARALLEL RESTRICTED or PARALLEL UNSAFE. If you set force_parallel_mode=regress, then some additional changes intended specifically for regression testing kick in; those changes are intended to ensure that you get exactly the same output from running the regression tests with the parallelism infrastructure forcibly enabled that you would have gotten anyway. Most of this code is mine, but there are also contributions from Amit Kapila and Rushabh Lathia. With both of these patches, you can create a file that says: force_parallel_mode=regress max_parallel_degree=2 Then you can run: make check-world TEMP_CONFIG=/path/to/aforementioned/file If you do, you'll find that while the core regression tests pass (whee!) the pg_upgrade regression tests fail (oops) because of a pre-existing bug in the parallelism code introduced by neither of these two patches. I'm not exactly sure how to fix that bug yet - I have a couple of ideas - but I think the fact that this test code promptly found a bug is good sign that it provides enough test coverage to be useful. Sticking a Gather node on top of every query where it looks safe just turns out to exercise a lot of things: the code that decides whether it's safe to put that Gather node, the code to launch and manage parallel workers, the code those workers themselves run, etc. The point is just to force as much of the parallel code to be used as possible even when it's not expected to make anything faster. test-group-locking-v1.patch is useful for testing possible deadlock scenarios with the group locking patch. It's not otherwise safe to use this, like, at all, and the patch is not proposed for commit. This patch is entirely by Amit Kapila. In addition to what's in these patches, I'd like to add a new chapter to the documentation explaining which queries can be parallelized and in what ways, what the restrictions are that keep parallel query from getting used, and some high-level details of how parallelism "works" in PostgreSQL from a user perspective. Things will obviously change here as we get more capabilities, but I think we're at a point where it makes sense to start putting this together. What I'm less clear about is where exactly in the current SGML documentation such a new chapter might fit; suggestions very welcome. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: