Обсуждение: a raft of parallelism-related bug fixes

Поиск
Список
Период
Сортировка

a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
My recent commit of the Gather executor node has made it relatively
simple to write code that does an end-to-end test of all of the
parallelism-relate commits which have thus far gone into the tree.
Specifically, what I've done is hacked the planner to push a
single-copy Gather node on top of every plan that is thought to be
parallel-safe, and then run 'make check'.  This turned up bugs in
nearly every parallelism-related commit that has thus far gone into
the tree, which is a little depressing, especially because some of
them are what we've taken to calling brown paper bag bugs.  The good
news is that, with one or two exceptions, these are pretty much just
trivial oversights which are simple to fix, rather than any sort of
deeper design issue.  Attached are 14 patches.  Patches #1-#4 are
essential for testing purposes but are not proposed for commit,
although some of the code they contain may eventually become part of
other patches which are proposed for commit.  Patches #5-#12 are
largely boring patches fixing fairly uninteresting mistakes; I propose
to commit these on an expedited basis.  Patches #13-14 are also
proposed for commit but seem to me to be more in need of review.  With
all of these patches, I can now get a clean 'make check' run, although
I think there are a few bugs remaining to be fixed because some of my
colleagues still experience misbehavior even with all of these patches
applied.  The patch stack is also posted here; the branch is subject
to rebasing:

http://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/gathertest

Here follows an overview of each individual patch (see also commit
messages within).

== For Testing Only ==

0001-Test-code.patch is the basic test code.  In addition to pushing a
Gather node on top of apparently-safe parallel plans, it also ignores
that Gather node when generating EXPLAIN output and suppressing
parallel context in error messages, changes which are essential to
getting the regression tests to pass.  I'm wondering if the parallel
context error ought to be GUC-controlled, defaulting to on but capable
of being enabled on request.

0002-contain_parallel_unsafe-check_parallel_safety.patch and
0003-Temporary-hack-to-reduce-testing-failures.patch arrange NOT to
put Gather nodes on top of plans that contain parallel-restricted
operations or refer to temporary tables.  Although such things can
exist in a parallel plan, they must be above every Gather node, not
beneath it.  Here, the Gather node is being placed (strictly for
testing purposes) at the very top, so we must not insert it at all if
these things are present.

0004-Partial-group-locking-implementation.patch is a partial
implementation of group locking.  I found that without this, the
regression tests hang frequently, and a clean run is impossible. This
patch doesn't modify the deadlock detector, and it doesn't take any
account of locks that should be mutually exclusive even as between
members of a parallel group, but it's enough for a clean regression
test run.  We will need a full solution to this problem soon enough,
but right now I am only using this to find such unrelated bugs as we
may have.

== Proposed For Commit ==

0005-Don-t-send-protocol-messages-to-a-shm_mq-that-no-lon.patch fixes
a problem in the parallel worker shutdown sequence: a background
worker can choose to redirect messages that would normally be sent to
a client to a shm_mq, and parallel workers always do this.  But if the
worker generates a message after the DSM has been detached, it causes
a server crash.

0006-Transfer-current-command-counter-ID-to-parallel-work.patch fixes
a problem in the code used to set up a parallel worker's transaction
state.  The command counter is presently not copied to the worker.
This is awfully embarrassing and should have been caught in the
testing of the parallel mode/contexts patch, but I got overly focused
on the stuff stored inside TransactionStateData.  Don't shoot.

0007-Tighten-up-application-of-parallel-mode-checks.patch fixes
another problem with the parallel mode checks, which are intended to
catch people doing unsafe things and throw errors instead of letting
them crash the server.  Investigation reveals that they don't have
this effect because parallel workers were running their pre-commit
sequence with the checks disabled.  If they do something like try to
send notifications, it can lead to the worker getting an XID
assignment even though the master doesn't have one.  That's really
bad, and crashes the server.  That specific example should be
prohibited anyway (see patch #11) but even if we fix that I think this
is good tightening to prevent unpleasant surprises in the future.

0008-Invalidate-caches-after-cranking-up-a-parallel-worke.patch
invalidates invalidates system cache entries after cranking up a
parallel worker transaction.  This is needed here for the same reason
that the logical decoding code needs to do it after time traveling:
otherwise, the worker might have leftover entries in its caches as a
result of the startup transaction that are now bogus given the changes
in what it can see.

0009-Fix-a-problem-with-parallel-workers-being-unable-to-.patch fixes
a problem with workers being unable to precisely recreate the
authorization state as it existed in the parallel leader. They need to
do that, or else it's a security vulnerability.

0010-Prohibit-parallel-query-when-the-isolation-level-is-.patch
prohibits parallel query at the serializable isolation level.  This is
of course a restriction we'd rather not have, but it's a necessary one
for now because the predicate locking code doesn't understand the idea
of multiple processes with separate PGPROC structures being part of a
single transaction.

0011-Mark-more-functions-parallel-restricted-or-parallel-.patch marks
some functions as parallel-restricted or parallel-unsafe that in fact
are, but were not so marked by the commit that introduced the new
pg_proc flag.  This includes functions for sending notifications and a
few others.

0012-Rewrite-interaction-of-parallel-mode-with-parallel-e.patch
rejiggers the timing of enabling and disabling parallel mode when we
are attempting parallel execution.  The old coding turned out to be
fragile in multiple ways.  Since it's impractical to know at planning
time with ExecutorRun will be called with a non-zero tuple count, this
patch instead observes whether or not this happens, and if it does
happen, the parallel plan is forced to run serially.  In the old
coding, it instead just killed the parallel workers at the end of
ExecutorRun and therefore returned an incomplete result set.  There
might be some further rejiggering that could be done here that would
be even better than this, but I'm fairly certain this is better than
what we've got in the tree right now.

0013-Modify-tqueue-infrastructure-to-support-transient-re.patch
attempts to address a deficiency in the tqueue.c/tqueue.h machinery I
recently introduced: backends can have ephemeral record types for
which they use backend-local typmods that may not be the same between
the leader and the worker.  This patch has the worker send metadata
about the tuple descriptor for each such type, and the leader
registers the same tuple descriptor and then remaps the typmods from
the worker's typmod space to its own.  This seems to work, but I'm a
little concerned that there may be cases it doesn't cover.  Also,
there's room to question the overall approach.  The only other
alternative that springs readily to mind is to try to arrange things
during the planning phase so that we never try to pass records between
parallel backends in this way, but that seems like it would be hard to
code (and thus likely to have bugs) and also pretty limiting.

0014-Fix-problems-with-ParamListInfo-serialization-mechan.patch, which
I just posted on the Parallel Seq Scan thread as a standalone patch,
fixes pretty much what the name of the file suggests.  This actually
fixes two problems, one of which Noah spotted and commented on over on
that thread.  By pure coincidence, the last 'make check' regression
failure I was still troubleshooting needed a fix for that issue plus a
fix to plpgsql_param_fetch.  However, as I mentioned on the other
thread, I'm not quite sure which way to go with the change to
plpgsql_param_fetch so scrutiny of that point, in particular, would be
appreciated.  See also
http://www.postgresql.org/message-id/CA+TgmobN=wADVaUTwsH-xqvCdovkeRasuXw2k3R6vmpWig7raw@mail.gmail.com

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Oct 12, 2015 at 1:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Attached are 14 patches.  Patches #1-#4 are
> essential for testing purposes but are not proposed for commit,
> although some of the code they contain may eventually become part of
> other patches which are proposed for commit.  Patches #5-#12 are
> largely boring patches fixing fairly uninteresting mistakes; I propose
> to commit these on an expedited basis.  Patches #13-14 are also
> proposed for commit but seem to me to be more in need of review.

Hearing no objections, I've now gone and committed #5-#12.

> 0013-Modify-tqueue-infrastructure-to-support-transient-re.patch
> attempts to address a deficiency in the tqueue.c/tqueue.h machinery I
> recently introduced: backends can have ephemeral record types for
> which they use backend-local typmods that may not be the same between
> the leader and the worker.  This patch has the worker send metadata
> about the tuple descriptor for each such type, and the leader
> registers the same tuple descriptor and then remaps the typmods from
> the worker's typmod space to its own.  This seems to work, but I'm a
> little concerned that there may be cases it doesn't cover.  Also,
> there's room to question the overall approach.  The only other
> alternative that springs readily to mind is to try to arrange things
> during the planning phase so that we never try to pass records between
> parallel backends in this way, but that seems like it would be hard to
> code (and thus likely to have bugs) and also pretty limiting.

I am still hoping someone will step up to review this.

> 0014-Fix-problems-with-ParamListInfo-serialization-mechan.patch, which
> I just posted on the Parallel Seq Scan thread as a standalone patch,
> fixes pretty much what the name of the file suggests.  This actually
> fixes two problems, one of which Noah spotted and commented on over on
> that thread.  By pure coincidence, the last 'make check' regression
> failure I was still troubleshooting needed a fix for that issue plus a
> fix to plpgsql_param_fetch.  However, as I mentioned on the other
> thread, I'm not quite sure which way to go with the change to
> plpgsql_param_fetch so scrutiny of that point, in particular, would be
> appreciated.  See also
> http://www.postgresql.org/message-id/CA+TgmobN=wADVaUTwsH-xqvCdovkeRasuXw2k3R6vmpWig7raw@mail.gmail.com

Noah's been helping with this issue on the other thread.  I'll revise
this patch along the lines discussed there and resubmit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Simon Riggs
Дата:
On 12 October 2015 at 18:04, Robert Haas <robertmhaas@gmail.com> wrote:
 
My recent commit of the Gather executor node has made it relatively
simple to write code that does an end-to-end test of all of the
parallelism-relate commits which have thus far gone into the tree.

I've been wanting to help here for a while, but time remains limited for next month or so.

From reading this my understanding is that there isn't a test suite included with this commit?

I've tried to review the Gather node commit and I note that the commit message contains a longer description of the functionality in that patch than any comments in the patch as a whole. No design comments, no README, no file header comments. For such a major feature that isn't acceptable - I would reject a patch from others on that basis alone (and have done so). We must keep the level of comments high if we are to encourage wider participation in the project.

So reviewing patch 13 isn't possible without prior knowledge.

Hoping we'll be able to find some time on this at PGConf.eu; thanks for coming over.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Sat, Oct 17, 2015 at 9:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> From reading this my understanding is that there isn't a test suite included
> with this commit?

Right.  The patches on the thread contain code that can be used for
testing, but the committed code does not itself include test coverage.
I welcome thoughts on how we could perform automated testing of this
code.  I think at least part of the answer is that I need to press on
toward getting the rest of Amit's parallel sequential scan patch
committed, because then this becomes a user-visible feature and I
expect that to make it much easier to find whatever bugs remain.  A
big part of the difficulty in testing this up until now is that I've
been building towards, hey, we have parallel query.  Until we actually
do, you need to write C code to test this, which raises the bar
considerably.

Now, that does not mean we shouldn't test this in other ways, and of
course I have, and so have Amit and other people from the community -
of late, Noah Misch and Haribabu Kommi have found several bugs through
code inspection and testing, which included some of the same ones that
I was busy finding and fixing using the test code attached to this
thread.  That's one of the reasons why I wanted to press forward with
getting the fixes for those bugs committed. It's just a waste of
everybody's time if we re-finding known bugs for which fixes already
exist.

But the question of how to test this in the buildfarm is a good one,
and I don't have a complete answer.  Once the rest of this goes in,
which I hope will be soon, we can EXPLAIN or EXPLAIN ANALYZE or just
straight up run parallel queries in the regression test suite and see
that they behave as expected.  But I don't expect that to provide
terribly good test coverage.  One idea that I think would provide
*excellent* test coverage is to take the test code included on this
thread and run it on the buildfarm.  The idea of the code is to
basically run the regression test suite with every parallel-eligible
query forced to unnecessarily use parallelism.  Turning that and
running 'make check' found, directly or indirectly, all of these bugs.
Doing that on the whole buildfarm would probably find more.

However, I'm pretty sure that we don't want to switch the *entire*
buildfarm to using lots of unnecessary parallelism.  What we might be
able to do is have some critters that people spin up for this precise
purpose.  Just like we currently have CLOBBER_CACHE_ALWAYS buildfarm
members, we could have GRATUITOUSLY_PARALLEL buildfarm members.  If
Andrew is willing to add buildfarm support for that option and a few
people are willing to run critters in that mode, I will be happy -
more than happy, really - to put the test code into committable form,
guarded by a #define, and away we go.

Of course, other ideas for testing are also welcome.

> I've tried to review the Gather node commit and I note that the commit
> message contains a longer description of the functionality in that patch
> than any comments in the patch as a whole. No design comments, no README, no
> file header comments. For such a major feature that isn't acceptable - I
> would reject a patch from others on that basis alone (and have done so). We
> must keep the level of comments high if we are to encourage wider
> participation in the project.

It's good to have your perspective on how this can be improved, and
I'm definitely willing to write more documentation.  Any lack in that
area is probably due to being too close to the subject area, having
spent several years on parallelism in general, and 200+ emails on
parallel sequential scan in particular.  Your point about the lack of
a good header file comment for execParallel.c is a good one, and I'll
rectify that next week.

It's worth noting, though, that the executor files in general don't
contain great gobs of comments, and the executor README even has this
vintage 2001 comment:

XXX a great deal more documentation needs to be written here...

Well, yeah.  It's taken me a long time to understand how the executor
actually works, and there are parts of it - particularly related to
EvalPlanQual - that I still don't fully understand.  So some of the
lack of comments in, for example, nodeGather.c is because it copies
the style of other executor nodes, like nodeSeqscan.c.  It's not
exactly clear to me what more to document there.  You either
understand what a rescan node is, in which case the code for each
node's rescan method tends to be fairly self-evident, or you don't -
but that clearly shouldn't be re-explained in every file.  So I guess
what I'm saying is I could use some advice on what kinds things would
be most useful to document, and where to put that documentation.

Right now, the best explanation of how parallelism works is in
src/backend/access/transam/README.parallel -- but, as you rightly
point out, that doesn't cover the executor bits.  Should we have SGML
documentation under "VII. Internals" that explains what's under the
hood in the same way that we have sections for "Database Physical
Storage" and "PostgreSQL Coding Conventions"?  Should the stuff in the
existing README.parallel be moved there?  Or I could just add some
words to src/backend/executor/README to cover the parallel executor
stuff, if that is preferred.  Advice?

Also, regardless of how we document what's going on at the code level,
I think we probably should have a section *somewhere* in the main SGML
documentation that kind of explains the general concepts behind
parallel query from a user/DBA perspective.  But I don't know where to
put it.  Under "Server Administration"?  Exactly what to explain there
needs some thought, too.  I'm sort of wondering if we need two
chapters in the documentation on this, one that covers it from a
user/DBA perspective and the other of which covers it from a hacker
perspective.  But then maybe the hacker stuff should just go in README
files.  I'm not sure.  I may have to try writing some of this and see
how it goes, but advice is definitely appreciated.

I am happy to definitively commit to writing whatever documentation
the community feels is necessary here, and I will do that certainly
before end of development for 9.6 and hopefully much sooner than that.
I will do that even if I don't get any specific feedback on what to
write and where to put it, but the more feedback I get, the better the
result will probably be.  Some of the reason this hasn't been done
already is because we're still getting the infrastructure into place,
and we're fixing and adjusting things as we go along, so while the
overall picture isn't changing much, there are bits of the design that
are still in flux as we realize, oh, crap, that was a dumb idea.  As
we get a clearer idea what will be in 9.6, it will get easier to
present the overall picture in a coherent way.

> So reviewing patch 13 isn't possible without prior knowledge.

The basic question for patch 13 is whether ephemeral record types can
occur in executor tuples in any contexts that I haven't identified.  I
know that a tuple table slot can contain have a column that is of type
record or record[], and those records can themselves contain
attributes of type record or record[], and so on as far down as you
like.  I *think* that's the only case.  For example, I don't believe
that a TupleTableSlot can contain a *named* record type that has an
anonymous record buried down inside of it somehow.  But I'm not
positive I'm right about that.

> Hoping we'll be able to find some time on this at PGConf.eu; thanks for
> coming over.

Sure thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Noah Misch
Дата:
On Sat, Oct 17, 2015 at 06:17:37PM -0400, Robert Haas wrote:
> One idea that I think would provide
> *excellent* test coverage is to take the test code included on this
> thread and run it on the buildfarm.  The idea of the code is to
> basically run the regression test suite with every parallel-eligible
> query forced to unnecessarily use parallelism.  Turning that and
> running 'make check' found, directly or indirectly, all of these bugs.
> Doing that on the whole buildfarm would probably find more.
> 
> However, I'm pretty sure that we don't want to switch the *entire*
> buildfarm to using lots of unnecessary parallelism.  What we might be
> able to do is have some critters that people spin up for this precise
> purpose.  Just like we currently have CLOBBER_CACHE_ALWAYS buildfarm
> members, we could have GRATUITOUSLY_PARALLEL buildfarm members.  If
> Andrew is willing to add buildfarm support for that option and a few

What, if anything, would this mode require beyond adding a #define?  If
nothing, it won't require specific support in the buildfarm script.
CLOBBER_CACHE_ALWAYS has no specific support.

> people are willing to run critters in that mode, I will be happy -
> more than happy, really - to put the test code into committable form,
> guarded by a #define, and away we go.

I would make one such animal.



Re: a raft of parallelism-related bug fixes

От
Stephen Frost
Дата:
* Noah Misch (noah@leadboat.com) wrote:
> On Sat, Oct 17, 2015 at 06:17:37PM -0400, Robert Haas wrote:
> > people are willing to run critters in that mode, I will be happy -
> > more than happy, really - to put the test code into committable form,
> > guarded by a #define, and away we go.
>
> I would make one such animal.

We're also looking at what animals it makes sense to run as part of
pginfra and I expect we'd be able to include an animal for these tests
also (though Stefan is the one really driving that effort).

Thanks!

Stephen

Re: a raft of parallelism-related bug fixes

От
Andrew Dunstan
Дата:

On 10/17/2015 06:17 PM, Robert Haas wrote:

>
> However, I'm pretty sure that we don't want to switch the *entire*
> buildfarm to using lots of unnecessary parallelism.  What we might be
> able to do is have some critters that people spin up for this precise
> purpose.  Just like we currently have CLOBBER_CACHE_ALWAYS buildfarm
> members, we could have GRATUITOUSLY_PARALLEL buildfarm members.  If
> Andrew is willing to add buildfarm support for that option and a few
> people are willing to run critters in that mode, I will be happy -
> more than happy, really - to put the test code into committable form,
> guarded by a #define, and away we go.
>
>


If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no 
special buildfarm support is required - you would just add that to the 
animal's config file, more or less like this:
     config_env =>     {         CPPFLAGS => '-DGRATUITOUSLY_PARALLEL',     },

I try to make things easy :-)


cheers

andrew



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Sat, Oct 17, 2015 at 9:16 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no
> special buildfarm support is required - you would just add that to the
> animal's config file, more or less like this:
>
>      config_env =>
>      {
>          CPPFLAGS => '-DGRATUITOUSLY_PARALLEL',
>      },
>
> I try to make things easy :-)

Wow, that's great.  So, I'll try to rework the test code I posted
previously into something less hacky, and eventually add a #define
like this so we can run it on the buildfarm.  There's a few other
things that need to get done before that really makes sense - like
getting the rest of the bug fix patches committed - otherwise any
buildfarm critters we add will just be permanently red.

Thanks to Noah and Stephen for your replies also - it is good to hear
that if I spend the time to make this committable, somebody will use
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Sat, Oct 17, 2015 at 6:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> It's good to have your perspective on how this can be improved, and
> I'm definitely willing to write more documentation.  Any lack in that
> area is probably due to being too close to the subject area, having
> spent several years on parallelism in general, and 200+ emails on
> parallel sequential scan in particular.  Your point about the lack of
> a good header file comment for execParallel.c is a good one, and I'll
> rectify that next week.

Here is a patch to add a hopefully-useful file header comment to
execParallel.c.  I included one for nodeGather.c as well, which seems
to be contrary to previous practice, but actually it seems like
previous practice is not the greatest: surely it's not self-evident
what all of the executor nodes do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: a raft of parallelism-related bug fixes

От
Simon Riggs
Дата:
On 17 October 2015 at 18:17, Robert Haas <robertmhaas@gmail.com> wrote:
 
It's good to have your perspective on how this can be improved, and
I'm definitely willing to write more documentation.  Any lack in that
area is probably due to being too close to the subject area, having
spent several years on parallelism in general, and 200+ emails on
parallel sequential scan in particular.  Your point about the lack of
a good header file comment for execParallel.c is a good one, and I'll
rectify that next week.

Not on your case in a big way, just noting the need for change there.

I'll help as well, but if you could start with enough basics to allow me to ask questions that will help. Thanks. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: a raft of parallelism-related bug fixes

От
Amit Kapila
Дата:
On Tue, Oct 20, 2015 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Oct 17, 2015 at 6:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > It's good to have your perspective on how this can be improved, and
> > I'm definitely willing to write more documentation.  Any lack in that
> > area is probably due to being too close to the subject area, having
> > spent several years on parallelism in general, and 200+ emails on
> > parallel sequential scan in particular.  Your point about the lack of
> > a good header file comment for execParallel.c is a good one, and I'll
> > rectify that next week.
>
> Here is a patch to add a hopefully-useful file header comment to
> execParallel.c.  I included one for nodeGather.c as well, which seems
> to be contrary to previous practice, but actually it seems like
> previous practice is not the greatest: surely it's not self-evident
> what all of the executor nodes do.
>

+ * any ParamListInfo associated witih the query, buffer usage info, and
+ * the actual plan to be passed down to the worker.

typo 'witih'.

+ * return the results.  Therefore, a plan used with a single-copy Gather
+ * node not be parallel-aware.

"node not" seems to be incomplete.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: a raft of parallelism-related bug fixes

От
Amit Langote
Дата:


On Wednesday, 21 October 2015, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Oct 20, 2015 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Oct 17, 2015 at 6:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > It's good to have your perspective on how this can be improved, and
> > I'm definitely willing to write more documentation.  Any lack in that
> > area is probably due to being too close to the subject area, having
> > spent several years on parallelism in general, and 200+ emails on
> > parallel sequential scan in particular.  Your point about the lack of
> > a good header file comment for execParallel.c is a good one, and I'll
> > rectify that next week.
>
> Here is a patch to add a hopefully-useful file header comment to
> execParallel.c.  I included one for nodeGather.c as well, which seems
> to be contrary to previous practice, but actually it seems like
> previous practice is not the greatest: surely it's not self-evident
> what all of the executor nodes do.
>

+ * any ParamListInfo associated witih the query, buffer usage info, and
+ * the actual plan to be passed down to the worker.

typo 'witih'.

+ * return the results.  Therefore, a plan used with a single-copy Gather
+ * node not be parallel-aware.

"node not" seems to be incomplete.

... node *need* not be parallel aware?

Thanks,
Amit 

Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Wed, Oct 21, 2015 at 9:04 AM, Amit Langote <amitlangote09@gmail.com> wrote:
> ... node *need* not be parallel aware?

Yes, thanks.  Committed that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Tue, Oct 20, 2015 at 6:12 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Not on your case in a big way, just noting the need for change there.

Yes, I appreciate your attitude.  I think we are on the same wavelength.

> I'll help as well, but if you could start with enough basics to allow me to
> ask questions that will help. Thanks.

Will try to keep pushing in that direction.  May be easier once some
of the dust has settled.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Sun, Oct 18, 2015 at 12:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> So reviewing patch 13 isn't possible without prior knowledge.
>
> The basic question for patch 13 is whether ephemeral record types can
> occur in executor tuples in any contexts that I haven't identified.  I
> know that a tuple table slot can contain have a column that is of type
> record or record[], and those records can themselves contain
> attributes of type record or record[], and so on as far down as you
> like.  I *think* that's the only case.  For example, I don't believe
> that a TupleTableSlot can contain a *named* record type that has an
> anonymous record buried down inside of it somehow.  But I'm not
> positive I'm right about that.

I have done some more testing and investigation and determined that
this optimism was unwarranted.  It turns out that the type information
for composite and record types gets stored in two different places.
First, the TupleTableSlot has a type OID, indicating the sort of the
value it expects to be stored for that slot attribute.  Second, the
value itself contains a type OID and typmod.  And these don't have to
match.  For example, consider this query:

select row_to_json(i) from int8_tbl i(x,y);

Without i(x,y), the HeapTuple passed to row_to_json is labelled with
the pg_type OID of int8_tbl.  But with the query as written, it's
labeled as an anonymous record type.  If I jigger things by hacking
the code so that this is planned as Gather (single-copy) -> SeqScan,
with row_to_json evaluated at the Gather node, then the sequential
scan kicks out a tuple with a transient record type and stores it into
a slot whose type OID is still that of int8_tbl.  My previous patch
failed to deal with that; the attached one does.

The previous patch was also defective in a few other respects.  The
most significant of those, maybe, is that it somehow thought it was OK
to assume that transient typmods from all workers could be treated
interchangeably rather than individually.  To fix this, I've changed
the TupleQueueFunnel implemented by tqueue.c to be merely a
TupleQueueReader which handles reading from a single worker only.
nodeGather.c therefore creates one TupleQueueReader per worker instead
of a single TupleQueueFunnel for all workers; accordingly, the logic
for multiplexing multiple queues now lives in nodeGather.c.  This is
probably how I should have done it originally - someone, I think Jeff
Davis - complained previously that tqueue.c had no business embedding
the round-robin policy decision, and he was right.  So this addresses
that complaint as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Wed, Oct 28, 2015 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Oct 18, 2015 at 12:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> So reviewing patch 13 isn't possible without prior knowledge.
>>
>> The basic question for patch 13 is whether ephemeral record types can
>> occur in executor tuples in any contexts that I haven't identified.  I
>> know that a tuple table slot can contain have a column that is of type
>> record or record[], and those records can themselves contain
>> attributes of type record or record[], and so on as far down as you
>> like.  I *think* that's the only case.  For example, I don't believe
>> that a TupleTableSlot can contain a *named* record type that has an
>> anonymous record buried down inside of it somehow.  But I'm not
>> positive I'm right about that.
>
> I have done some more testing and investigation and determined that
> this optimism was unwarranted.  It turns out that the type information
> for composite and record types gets stored in two different places.
> First, the TupleTableSlot has a type OID, indicating the sort of the
> value it expects to be stored for that slot attribute.  Second, the
> value itself contains a type OID and typmod.  And these don't have to
> match.  For example, consider this query:
>
> select row_to_json(i) from int8_tbl i(x,y);
>
> Without i(x,y), the HeapTuple passed to row_to_json is labelled with
> the pg_type OID of int8_tbl.  But with the query as written, it's
> labeled as an anonymous record type.  If I jigger things by hacking
> the code so that this is planned as Gather (single-copy) -> SeqScan,
> with row_to_json evaluated at the Gather node, then the sequential
> scan kicks out a tuple with a transient record type and stores it into
> a slot whose type OID is still that of int8_tbl.  My previous patch
> failed to deal with that; the attached one does.
>
> The previous patch was also defective in a few other respects.  The
> most significant of those, maybe, is that it somehow thought it was OK
> to assume that transient typmods from all workers could be treated
> interchangeably rather than individually.  To fix this, I've changed
> the TupleQueueFunnel implemented by tqueue.c to be merely a
> TupleQueueReader which handles reading from a single worker only.
> nodeGather.c therefore creates one TupleQueueReader per worker instead
> of a single TupleQueueFunnel for all workers; accordingly, the logic
> for multiplexing multiple queues now lives in nodeGather.c.  This is
> probably how I should have done it originally - someone, I think Jeff
> Davis - complained previously that tqueue.c had no business embedding
> the round-robin policy decision, and he was right.  So this addresses
> that complaint as well.

Here is an updated version.  This is rebased over recent commits, and
I added a missing check for attisdropped.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Nov 2, 2015 at 9:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 28, 2015 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sun, Oct 18, 2015 at 12:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> So reviewing patch 13 isn't possible without prior knowledge.
>>>
>>> The basic question for patch 13 is whether ephemeral record types can
>>> occur in executor tuples in any contexts that I haven't identified.  I
>>> know that a tuple table slot can contain have a column that is of type
>>> record or record[], and those records can themselves contain
>>> attributes of type record or record[], and so on as far down as you
>>> like.  I *think* that's the only case.  For example, I don't believe
>>> that a TupleTableSlot can contain a *named* record type that has an
>>> anonymous record buried down inside of it somehow.  But I'm not
>>> positive I'm right about that.
>>
>> I have done some more testing and investigation and determined that
>> this optimism was unwarranted.  It turns out that the type information
>> for composite and record types gets stored in two different places.
>> First, the TupleTableSlot has a type OID, indicating the sort of the
>> value it expects to be stored for that slot attribute.  Second, the
>> value itself contains a type OID and typmod.  And these don't have to
>> match.  For example, consider this query:
>>
>> select row_to_json(i) from int8_tbl i(x,y);
>>
>> Without i(x,y), the HeapTuple passed to row_to_json is labelled with
>> the pg_type OID of int8_tbl.  But with the query as written, it's
>> labeled as an anonymous record type.  If I jigger things by hacking
>> the code so that this is planned as Gather (single-copy) -> SeqScan,
>> with row_to_json evaluated at the Gather node, then the sequential
>> scan kicks out a tuple with a transient record type and stores it into
>> a slot whose type OID is still that of int8_tbl.  My previous patch
>> failed to deal with that; the attached one does.
>>
>> The previous patch was also defective in a few other respects.  The
>> most significant of those, maybe, is that it somehow thought it was OK
>> to assume that transient typmods from all workers could be treated
>> interchangeably rather than individually.  To fix this, I've changed
>> the TupleQueueFunnel implemented by tqueue.c to be merely a
>> TupleQueueReader which handles reading from a single worker only.
>> nodeGather.c therefore creates one TupleQueueReader per worker instead
>> of a single TupleQueueFunnel for all workers; accordingly, the logic
>> for multiplexing multiple queues now lives in nodeGather.c.  This is
>> probably how I should have done it originally - someone, I think Jeff
>> Davis - complained previously that tqueue.c had no business embedding
>> the round-robin policy decision, and he was right.  So this addresses
>> that complaint as well.
>
> Here is an updated version.  This is rebased over recent commits, and
> I added a missing check for attisdropped.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Oct 19, 2015 at 12:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Oct 17, 2015 at 9:16 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no
>> special buildfarm support is required - you would just add that to the
>> animal's config file, more or less like this:
>>
>>      config_env =>
>>      {
>>          CPPFLAGS => '-DGRATUITOUSLY_PARALLEL',
>>      },
>>
>> I try to make things easy :-)
>
> Wow, that's great.  So, I'll try to rework the test code I posted
> previously into something less hacky, and eventually add a #define
> like this so we can run it on the buildfarm.  There's a few other
> things that need to get done before that really makes sense - like
> getting the rest of the bug fix patches committed - otherwise any
> buildfarm critters we add will just be permanently red.

OK, so after a bit more delay than I would have liked, I now have a
working set of patches that we can use to ensure automated testing of
the parallel mode infrastructure.  I ended up doing something that
does not require a #define, so I'll need some guidance on what to do
on the BF side given that context.  Please find attached three
patches, two of them for commit.

group-locking-v1.patch is a vastly improved version of the group
locking patch that we discussed, uh, extensively last year.  I realize
that there was a lot of doubt about this approach, but I still believe
it's the right approach, I have put a lot of work into making it work
correctly, I don't think anyone has come up with a really plausible
alternative approach (except one other approach I tried which turned
out to work but with significantly more restrictions), and I'm
committed to fixing it in whatever way is necessary if it turns out to
be broken, even if that amounts to a full rewrite.  Review is welcome,
but I honestly believe it's a good idea to get this into the tree
sooner rather than later at this point, because automated regression
testing falls to pieces without these changes, and I believe that
automated regression testing is a really good idea to shake out
whatever bugs we may have in the parallel query stuff.  The code in
this patch is all mine, but Amit Kapila deserves credit as co-author
for doing a lot of prototyping (that ended up getting tossed) and
testing.  This patch includes comments and an addition to
src/backend/storage/lmgr/README which explain in more detail what this
patch does, how it does it, and why that's OK.

force-parallel-mode-v1.patch is what adds the actual infrastructure
for automated testing.  You can set force_parallel_mode=on to force
queries to be ru in a worker whenever possible; this can help test
whether your user-defined functions have been erroneously labeled as
PARALLEL SAFE.  If they error out or misbehave with this setting
enabled, you should label them PARALLEL RESTRICTED or PARALLEL UNSAFE.
If you set force_parallel_mode=regress, then some additional changes
intended specifically for regression testing kick in; those changes
are intended to ensure that you get exactly the same output from
running the regression tests with the parallelism infrastructure
forcibly enabled that you would have gotten anyway.  Most of this code
is mine, but there are also contributions from Amit Kapila and Rushabh
Lathia.

With both of these patches, you can create a file that says:

force_parallel_mode=regress
max_parallel_degree=2

Then you can run: make check-world TEMP_CONFIG=/path/to/aforementioned/file

If you do, you'll find that while the core regression tests pass
(whee!) the pg_upgrade regression tests fail (oops) because of a
pre-existing bug in the parallelism code introduced by neither of
these two patches.  I'm not exactly sure how to fix that bug yet - I
have a couple of ideas - but I think the fact that this test code
promptly found a bug is good sign that it provides enough test
coverage to be useful.  Sticking a Gather node on top of every query
where it looks safe just turns out to exercise a lot of things: the
code that decides whether it's safe to put that Gather node, the code
to launch and manage parallel workers, the code those workers
themselves run, etc.  The point is just to force as much of the
parallel code to be used as possible even when it's not expected to
make anything faster.

test-group-locking-v1.patch is useful for testing possible deadlock
scenarios with the group locking patch.  It's not otherwise safe to
use this, like, at all, and the patch is not proposed for commit.
This patch is entirely by Amit Kapila.

In addition to what's in these patches, I'd like to add a new chapter
to the documentation explaining which queries can be parallelized and
in what ways, what the restrictions are that keep parallel query from
getting used, and some high-level details of how parallelism "works"
in PostgreSQL from a user perspective.  Things will obviously change
here as we get more capabilities, but I think we're at a point where
it makes sense to start putting this together.  What I'm less clear
about is where exactly in the current SGML documentation such a new
chapter might fit; suggestions very welcome.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: a raft of parallelism-related bug fixes

От
Andres Freund
Дата:
On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
> group-locking-v1.patch is a vastly improved version of the group
> locking patch that we discussed, uh, extensively last year.  I realize
> that there was a lot of doubt about this approach, but I still believe
> it's the right approach, I have put a lot of work into making it work
> correctly, I don't think anyone has come up with a really plausible
> alternative approach (except one other approach I tried which turned
> out to work but with significantly more restrictions), and I'm
> committed to fixing it in whatever way is necessary if it turns out to
> be broken, even if that amounts to a full rewrite.  Review is welcome,
> but I honestly believe it's a good idea to get this into the tree
> sooner rather than later at this point, because automated regression
> testing falls to pieces without these changes, and I believe that
> automated regression testing is a really good idea to shake out
> whatever bugs we may have in the parallel query stuff.  The code in
> this patch is all mine, but Amit Kapila deserves credit as co-author
> for doing a lot of prototyping (that ended up getting tossed) and
> testing.  This patch includes comments and an addition to
> src/backend/storage/lmgr/README which explain in more detail what this
> patch does, how it does it, and why that's OK.

I see you pushed group locking support. I do wonder if somebody has
actually reviewed this? On a quick scrollthrough it seems fairly
invasive, touching some parts where bugs are really hard to find.

I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.

Greetings,

Andres Freund



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
>> group-locking-v1.patch is a vastly improved version of the group
>> locking patch that we discussed, uh, extensively last year.  I realize
>> that there was a lot of doubt about this approach, but I still believe
>> it's the right approach, I have put a lot of work into making it work
>> correctly, I don't think anyone has come up with a really plausible
>> alternative approach (except one other approach I tried which turned
>> out to work but with significantly more restrictions), and I'm
>> committed to fixing it in whatever way is necessary if it turns out to
>> be broken, even if that amounts to a full rewrite.  Review is welcome,
>> but I honestly believe it's a good idea to get this into the tree
>> sooner rather than later at this point, because automated regression
>> testing falls to pieces without these changes, and I believe that
>> automated regression testing is a really good idea to shake out
>> whatever bugs we may have in the parallel query stuff.  The code in
>> this patch is all mine, but Amit Kapila deserves credit as co-author
>> for doing a lot of prototyping (that ended up getting tossed) and
>> testing.  This patch includes comments and an addition to
>> src/backend/storage/lmgr/README which explain in more detail what this
>> patch does, how it does it, and why that's OK.
>
> I see you pushed group locking support. I do wonder if somebody has
> actually reviewed this? On a quick scrollthrough it seems fairly
> invasive, touching some parts where bugs are really hard to find.
>
> I realize that this stuff has all been brewing long, and that there's
> still a lot to do. So you gotta keep moving. And I'm not sure that
> there's anything wrong or if there's any actually better approach. But
> pushing an unreviewed, complex patch, that originated in a thread
> orginally about different relatively small/mundane items, for a
> contentious issue, a few days after the initial post. Hm. Not sure how
> you'd react if you weren't the author.

Probably not very well.  Do you want me to revert it?

I mean, look.  Without that patch, parallel query is definitely
broken.  Just revert the patch and try running the regression tests
with force_parallel_mode=regress and max_parallel_degree>0.  It hangs
all over the place.  With the patch, every regression test suite we
have runs cleanly with those settings.  Without the patch, it's
trivial to construct a test case where parallel query experiences an
undetected deadlock.  With the patch, it appears to work reliably.
Could there bugs someplace?  Yes, there absolutely could.  Do I really
think anybody was going to spend the time to understand deadlock.c
well enough to verify my changes?  No, I don't.  What I think would
have happened is that the patch would have sat around like an
albatross around my neck - totally ignored by everyone - until the end
of the last CF, and then the discussion would have gone one of three
ways:

1. Boy, this patch is complicated and I don't understand it.  Let's
reject it, even though without it parallel query is trivially broken!
Uh, we'll just let parallel query be broken.
2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
3. Oh well, Robert says we need this, I guess we'd better let him commit it.

I don't find any of those options to be better than the status quo.
If the patch is broken, another two months of having in the tree give
us a better chance of finding the bugs, especially because, combined
with the other patch which I also pushed, it enables *actual automated
regression testing* of the parallelism code, which I personally think
is a really good thing - and I'd like to see the buildfarm doing that
as soon as possible, so that we can find some of those bugs before
we're deep in beta.  Not just bugs in group locking but all sorts of
parallelism bugs that might be revealed by end-to-end testing.  The
*entire stack of patches* that began this thread was a response to
problems that were found by the automated testing that you can't do
without this patch.  Those bug fixes resulted in a huge increase in
the robustness of parallel query, and that would not have happened
without this code.  Every single one of those problems, some of them
in commits dating back years, was detected by the same method: run the
regression tests with parallel mode and parallel workers used for
every query for which that seems to be safe.

And, by the way, the patch, aside from the deadlock.c portion, was
posted back in October, admittedly without much fanfare, but nobody
reviewed that or any other patch on this thread.  If I'd waited for
those reviews to come in, parallel query would not be committed now,
nor probably in 9.6, nor probably in 9.7 or 9.8 either.  The whole
project would just be impossible on its face.  It would be impossible
in the first instance if I did not have a commit bit, because there is
just not enough committer bandwidth - even reviewer bandwidth more
generally - to review the number of patches that I've submitted
related to parallelism, so in the end some, perhaps many, of those are
going to be committed mostly on the strength of my personal opinion
that committing them is better than not committing them.  I am going
to have a heck of a lot of egg on my face if it turns out that I've
been too aggressive in pushing this stuff into the tree.  But,
basically, the alternative is that we don't get the feature, and I
think the feature is important enough to justify taking some risk.

I think it's myopic to say "well, but this patch might have bugs".
Very true.  But also, all the other parallelism patches that are
already committed or that are still under review but which can't be
properly tested without this patch might have bugs, too, so you've got
to weigh the risk that this patch might get better if I wait longer to
commit it against the possibility that not having committed it reduces
the chances of finding bugs elsewhere.  I don't want it to seem like
I'm forcing this down the community's throat - I don't have a right to
do that, and I will certainly revert this patch if that is the
consensus.  But that is not what I think best myself.  What I think
would be better is to (1) make an effort to get the buildfarm testing
which this patch enables up and running as soon as possible and (2)
for somebody to read over the committed code and raise any issues that
they find.  Or, for that matter, to read over the committed code for
any of the *other* parallelism patches and raise any issues that they
find with *that* code.  There's certainly scads of code here and this
is far from the only bit that might have bugs.

Oh: another thing that I would like to do is commit the isolation
tests I wrote for the deadlock detector a while back, which nobody has
reviewed either, though Tom and Alvaro seemed reasonably positive
about the concept.  Right now, the deadlock.c part of this patch isn't
tested at all by any of our regression test suites, because NOTHING in
deadlock.c is tested by any of our regression test suites.  You can
blow it up with dynamite and the regression tests are perfectly happy,
and that's pretty scary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
"Joshua D. Drake"
Дата:
On 02/08/2016 10:45 AM, Robert Haas wrote:
> On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-02-02 15:41:45 -0500, Robert Haas wrote:

>> I realize that this stuff has all been brewing long, and that there's
>> still a lot to do. So you gotta keep moving. And I'm not sure that
>> there's anything wrong or if there's any actually better approach. But
>> pushing an unreviewed, complex patch, that originated in a thread
>> orginally about different relatively small/mundane items, for a
>> contentious issue, a few days after the initial post. Hm. Not sure how
>> you'd react if you weren't the author.
>
> Probably not very well.  Do you want me to revert it?

If I am off base, please feel free to yell Latin at me again but isn't 
this exactly what different trees are for in Git? Would it be possible 
to say:

Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism 
fixes do"?

I can't review this patch but I can run a test suite on a number of 
platforms and see if it behaves as expected.


> albatross around my neck - totally ignored by everyone - until the end
> of the last CF, and then the discussion would have gone one of three
> ways:
>
> 1. Boy, this patch is complicated and I don't understand it.  Let's
> reject it, even though without it parallel query is trivially broken!
> Uh, we'll just let parallel query be broken.
> 2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
> 3. Oh well, Robert says we need this, I guess we'd better let him commit it.

4. We need to push the release so we can test this.

>
> I don't find any of those options to be better than the status quo.
> If the patch is broken, another two months of having in the tree give
> us a better chance of finding the bugs, especially because, combined

I think this further points to the need for more reviewers and less 
feature pushes. There are fundamental features that we could use, this 
is one of them. It is certainly more important than say pgLogical or BDR 
(not that those aren't useful but that we do have external solutions for 
that problem).


> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
> reviewed either, though Tom and Alvaro seemed reasonably positive
> about the concept.  Right now, the deadlock.c part of this patch isn't
> tested at all by any of our regression test suites, because NOTHING in
> deadlock.c is tested by any of our regression test suites.  You can
> blow it up with dynamite and the regression tests are perfectly happy,
> and that's pretty scary.

Test test test. Please commit.

Sincerely,

JD



-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Feb 8, 2016 at 2:00 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> If I am off base, please feel free to yell Latin at me again but isn't this
> exactly what different trees are for in Git? Would it be possible to say:
>
> Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism
> fixes do"?
>
> I can't review this patch but I can run a test suite on a number of
> platforms and see if it behaves as expected.

Sure, I'd love to have the ability to push a branch into the buildfarm
and have the tests get run on all the buildfarm machines and let that
bake for a while before putting it into the main tree.  The problem
here is that the complicated part of this patch is something that's
only going to be tested in very rare cases.  The simple part of the
patch, which handles the simple-deadlock case, is easy to hit,
although apparently zero people other than Amit and I have found it in
the few months since parallel sequential scan was committed, which
makes me thing people haven't tried very hard to break any part of
parallel query, which is a shame.  The really hairy part is in
deadlock.c, and it's actually very hard to hit that case.  It won't be
hit in real life except in pretty rare circumstances.  So testing is
good, but you not only need to know what you are testing but probably
have an automated tool that can run the test a gazillion times in a
loop, or be really clever and find a test case that Amit and I didn't
foresee.  And the reality is that getting anybody independent of the
parallel query effort to take an interested in deep testing has not
gone anywhere at all up until now.  I'd be happy for that change,
whether because of this commit or for any other reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
> reviewed either, though Tom and Alvaro seemed reasonably positive
> about the concept.

Possibly the reason that wasn't reviewed is that it's not in the
commitfest list (or at least if it is, I sure don't see it).

Having said that, I don't have much of a problem with you pushing it
anyway, unless it will add 15 minutes to make check-world or some such.
        regards, tom lane



Re: a raft of parallelism-related bug fixes

От
"Joshua D. Drake"
Дата:
On 02/08/2016 11:24 AM, Robert Haas wrote:
> On Mon, Feb 8, 2016 at 2:00 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>> If I am off base, please feel free to yell Latin at me again but isn't this
>> exactly what different trees are for in Git? Would it be possible to say:
>>
>> Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism
>> fixes do"?
>>
>> I can't review this patch but I can run a test suite on a number of
>> platforms and see if it behaves as expected.
>
> Sure, I'd love to have the ability to push a branch into the buildfarm
> and have the tests get run on all the buildfarm machines and let that
> bake for a while before putting it into the main tree.  The problem
> here is that the complicated part of this patch is something that's
> only going to be tested in very rare cases.  The simple part of the


I have no problem running any test cases you wish on a branch in a loop 
for the next week and reporting back any errors.

Where this gets tricky is the tooling itself. For me to be able to do so 
(and others really) I need to be able to do this:
* Download (preferably a tarball but I can do a git pull)* Exact instructions on how to set up the tests* Exact
instructionson how to run the tests        * Exact instructions on how to report the tests
 

If anyone takes the time to do that, I will take the time and resources 
to run them.

What I can't do, is fiddle around trying to figure out how to set this 
stuff up. I don't have the time and it isn't productive for me. I don't 
think I am the only one in this boat.

Let's be honest, a lot of people won't even bother to play with this 
even though it is easily one of the best features we have coming for 9.6 
until we release 9.6.0. That is a bad time to be testing.

The easier we make it for people like me, practitioners to test, the 
better it is for the whole project.

Sincerely,

JD




-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: a raft of parallelism-related bug fixes

От
Peter Geoghegan
Дата:
On Mon, Feb 8, 2016 at 10:45 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> And, by the way, the patch, aside from the deadlock.c portion, was
> posted back in October, admittedly without much fanfare, but nobody
> reviewed that or any other patch on this thread.  If I'd waited for
> those reviews to come in, parallel query would not be committed now,
> nor probably in 9.6, nor probably in 9.7 or 9.8 either.  The whole
> project would just be impossible on its face.  It would be impossible
> in the first instance if I did not have a commit bit, because there is
> just not enough committer bandwidth - even reviewer bandwidth more
> generally - to review the number of patches that I've submitted
> related to parallelism, so in the end some, perhaps many, of those are
> going to be committed mostly on the strength of my personal opinion
> that committing them is better than not committing them.  I am going
> to have a heck of a lot of egg on my face if it turns out that I've
> been too aggressive in pushing this stuff into the tree.  But,
> basically, the alternative is that we don't get the feature, and I
> think the feature is important enough to justify taking some risk.

FWIW, I appreciate your candor. However, I think that you could have
done a better job of making things easier for reviewers, even if that
might not have made an enormous difference. I suspect I would have not
been able to get UPSERT done as a non-committer if it wasn't for the
epic wiki page, that made it at least possible for someone to jump in.

To be more specific, I thought it was really hard to test parallel
sequential scan a few months ago, because there was so many threads
and so many dependencies. I appreciate that we now use git
format-patch patch series for complicated stuff these days, but it's
important to make it clear how everything fits together. That's
actually what I was thinking about when I said we need to be clear on
how things fit together from the CF app patch page, because there
doesn't seem to be a culture of being particular about that, having
good "annotations", etc.

-- 
Peter Geoghegan



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Feb 8, 2016 at 2:36 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> I have no problem running any test cases you wish on a branch in a loop for
> the next week and reporting back any errors.
>
> Where this gets tricky is the tooling itself. For me to be able to do so
> (and others really) I need to be able to do this:
>
>         * Download (preferably a tarball but I can do a git pull)
>         * Exact instructions on how to set up the tests
>         * Exact instructions on how to run the tests
>         * Exact instructions on how to report the tests
>
> If anyone takes the time to do that, I will take the time and resources to
> run them.

Well, what I've done is push into the buildfarm code that will allow
us to do *the most exhaustive* testing that I know how to do in an
automated fashion. Which is to create a file that says this:

force_parallel_mode=regress
max_parallel_degree=2

And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file

Now, that is not going to find bugs in the deadlock.c portion of the
group locking patch, but it's been wildly successful in finding bugs
in other parts of the parallelism code, and there might well be a few
more that we haven't found yet, which is why I'm hoping that we'll get
this procedure running regularly either on all buildfarm machines, or
on some subset of them, or on new animals that just do this.

Testing the deadlock.c changes is harder.  I don't know of a good way
to do it in an automated fashion, which is why I also posted the test
code Amit devised which allows construction of manual test cases.
Constructing a manual test case is *hard* but doable.  I think it
would be good to automate this and if somebody's got a good idea about
how to fuzz test it I think that would be *great*.  But that's not
easy to do.  We haven't had any testing at all of the deadlock
detector up until now, but somehow the deadlock detector itself has
been in the tree for  a very long time...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Alvaro Herrera
Дата:
Robert Haas wrote:

> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
> reviewed either, though Tom and Alvaro seemed reasonably positive
> about the concept.  Right now, the deadlock.c part of this patch isn't
> tested at all by any of our regression test suites, because NOTHING in
> deadlock.c is tested by any of our regression test suites.  You can
> blow it up with dynamite and the regression tests are perfectly happy,
> and that's pretty scary.

FWIW a couple of months back I thought you had already pushed that one
and was surprised to find that you hadn't.  +1 from me on pushing it.
(I don't mean specifically the deadlock tests, but rather the
isolationtester changes that allowed you to have multiple blocked
backends.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Feb 8, 2016 at 2:48 PM, Peter Geoghegan <pg@heroku.com> wrote:
> FWIW, I appreciate your candor. However, I think that you could have
> done a better job of making things easier for reviewers, even if that
> might not have made an enormous difference. I suspect I would have not
> been able to get UPSERT done as a non-committer if it wasn't for the
> epic wiki page, that made it at least possible for someone to jump in.

I'm not going to argue with the proposition that it could have been
done better.  Equally, I'm going to disclaim having the ability to
have done it better.  I've been working on this for three years, and
most of the work that I've put into it has gone into tinkering with C
code that was not in any way user-testable.  I've modified essentially
every major component of the system.  We had a shared memory facility;
I built another one.  We had background workers; I overhauled them.  I
invented a message queueing system, and then layered a modified
version of the FE/BE protocol on top of that message queue, and then
later layered tuple-passing on top of that same message queue and then
invented a bespoke protocol that is used to handle typemod mapping.
We had a transaction system; I made substantial, invasive
modifications to it.  I tinkered with the GUC subsystem, the combocid
system, and the system for loading loadable modules.  Amit added read
functions to a whole class of nodes that never had them before and
together we overhauled core pieces of the executer machinery.  Then I
hit the planner with hammer.  Finally there's this patch, which
affects heavyweight locking and deadlock detection.  I don't believe
that during the time I've been involved with this project anyone else
has ever attempted a project that required changing as many subsystems
as this one did - in some cases rather lightly, but in a number of
cases in pretty significant, invasive ways.  No other project in
recent memory has been this invasive to my knowledge.  Hot Standby
probably comes closest, but I think (admittedly being much closer to
this work than I was to that work) that this has its fingers in more
places.  So, there may be a person who knows how to do all of that
work and get it done in a reasonable time frame and also knows how to
make sure that everybody has the opportunity to be as involved in the
process as they want to be and that there are no bugs or controversial
design decisions, but I am not that person.  I am doing my best.

> To be more specific, I thought it was really hard to test parallel
> sequential scan a few months ago, because there was so many threads
> and so many dependencies. I appreciate that we now use git
> format-patch patch series for complicated stuff these days, but it's
> important to make it clear how everything fits together. That's
> actually what I was thinking about when I said we need to be clear on
> how things fit together from the CF app patch page, because there
> doesn't seem to be a culture of being particular about that, having
> good "annotations", etc.

I agree that you had to be pretty deeply involved in that thread to
follow everything that was going on.  But it's not entirely fair to
say that it was impossible for anyone else to get involved.   Both
Amit and I, mostly Amit, posted directions at various times saying:
here is the sequence of patches that you currently need to apply as of
this time.  There was not a heck of a lot of evidence that anyone was
doing that, though, though I think a few people did, and towards the
end things changed very quickly as I committed patches in the series.
We certainly knew what each other were doing and not because of some
hidden off-list collaboration that we kept secret from the community -
we do talk every week, but almost all of our correspondence on those
patches was on-list.

I think it's an inherent peril of complicated patch sets that people
who are not intimately involved in what is going on will have trouble
following just because it takes a lot of work.  Is anybody here
following what is going on on the postgres_fdw join pushdown thread?
There's only one patch to apply there right now (though there have
been as many as four at times in the past) and the people who are
actually working on it can follow along, but I'm not a bit surprised
if other people feel lost.  It's hard to think that the cause of that
is anything other than "it's hard to find the time to get invested in
a patch that other people are already working hard and apparently
diligently on, especially if you're not personally interested in
seeing that patch get committed, but sometimes even if you are".  For
example, I really want the work Fabien and Andres are doing on the
checkpointer to get committed this release.  I am reading the emails,
but I haven't tried the patches and I probably won't.  I don't have
time to be that involved in every patch.  I'm trusting that whatever
Andres commits - which will probably be a whole lot more complex than
what Fabien initially did - will be the right thing to commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Peter Geoghegan
Дата:
On Mon, Feb 8, 2016 at 12:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> So, there may be a person who knows how to do all of that
> work and get it done in a reasonable time frame and also knows how to
> make sure that everybody has the opportunity to be as involved in the
> process as they want to be and that there are no bugs or controversial
> design decisions, but I am not that person.  I am doing my best.
>
>> To be more specific, I thought it was really hard to test parallel
>> sequential scan a few months ago, because there was so many threads
>> and so many dependencies. I appreciate that we now use git
>> format-patch patch series for complicated stuff these days, but it's
>> important to make it clear how everything fits together. That's
>> actually what I was thinking about when I said we need to be clear on
>> how things fit together from the CF app patch page, because there
>> doesn't seem to be a culture of being particular about that, having
>> good "annotations", etc.
>
> I agree that you had to be pretty deeply involved in that thread to
> follow everything that was going on.  But it's not entirely fair to
> say that it was impossible for anyone else to get involved.

All that I wanted to do was look at EXPLAIN ANALYZE output that showed
a parallel seq scan on my laptop, simply because I wanted to see a
cool thing happen. I had to complain about it [1] to get clarification
from you [2].

I accept that this might have been a somewhat isolated incident (that
I couldn't easily get *at least* a little instant gratification), but
it still should be avoided. You've accused me of burying the lead
plenty of times. Don't tell me that it was too hard to prominently
place those details somewhere where I or any other contributor could
reasonably expect to find them, like the CF app page, or a wiki page
that is maintained on an ongoing basis (and linked to at the start of
each thread). If I said that that was too much to you, you'd probably
shout at me. If I persisted, you wouldn't commit my patch, and for me
that probably means it's DOA.

I don't think I'm asking for much here.

[1] http://www.postgresql.org/message-id/CAM3SWZSefE4uQk3r_3gwpfDWWtT3P51SceVsL4=g8v_mE2Abtg@mail.gmail.com
[2] http://www.postgresql.org/message-id/CA+TgmoartTF8eptBhiNwxUkfkctsFc7WtZFhGEGQywE8e2vCmg@mail.gmail.com
-- 
Peter Geoghegan



Re: a raft of parallelism-related bug fixes

От
"Joshua D. Drake"
Дата:
On 02/08/2016 01:11 PM, Peter Geoghegan wrote:
> On Mon, Feb 8, 2016 at 12:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> I accept that this might have been a somewhat isolated incident (that
> I couldn't easily get *at least* a little instant gratification), but
> it still should be avoided. You've accused me of burying the lead
> plenty of times. Don't tell me that it was too hard to prominently
> place those details somewhere where I or any other contributor could
> reasonably expect to find them, like the CF app page, or a wiki page
> that is maintained on an ongoing basis (and linked to at the start of
> each thread). If I said that that was too much to you, you'd probably
> shout at me. If I persisted, you wouldn't commit my patch, and for me
> that probably means it's DOA.
>
> I don't think I'm asking for much here.
>
> [1] http://www.postgresql.org/message-id/CAM3SWZSefE4uQk3r_3gwpfDWWtT3P51SceVsL4=g8v_mE2Abtg@mail.gmail.com
> [2] http://www.postgresql.org/message-id/CA+TgmoartTF8eptBhiNwxUkfkctsFc7WtZFhGEGQywE8e2vCmg@mail.gmail.com

This part of the thread seems like something that should be a new thread 
about how to write patches. I agree that patches that are large features 
that are in depth discussed on a maintained wiki page would be awesome. 
Creating that knowledge base without having to troll through code would 
be priceless in value.

Sincerely,

JD



-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Feb 8, 2016 at 4:11 PM, Peter Geoghegan <pg@heroku.com> wrote:
> All that I wanted to do was look at EXPLAIN ANALYZE output that showed
> a parallel seq scan on my laptop, simply because I wanted to see a
> cool thing happen. I had to complain about it [1] to get clarification
> from you [2].
>
> I accept that this might have been a somewhat isolated incident (that
> I couldn't easily get *at least* a little instant gratification), but
> it still should be avoided. You've accused me of burying the lead
> plenty of times. Don't tell me that it was too hard to prominently
> place those details somewhere where I or any other contributor could
> reasonably expect to find them, like the CF app page, or a wiki page
> that is maintained on an ongoing basis (and linked to at the start of
> each thread). If I said that that was too much to you, you'd probably
> shout at me. If I persisted, you wouldn't commit my patch, and for me
> that probably means it's DOA.
>
> I don't think I'm asking for much here.

I don't think you are asking for too much; what I think is that Amit
and I were trying to do exactly the thing you asked for, and mostly
did.  On March 20th, Amit posted version 11 of the sequential scan
patch, and included directions about the order in which to apply the
patches:

http://www.postgresql.org/message-id/CAA4eK1JSSonzKSN=L-DWuCEWdLqkbMUjvfpE3fGW2tn2zPo2RQ@mail.gmail.com

On March 25th, Amit posted version 12 of the sequential scan patch,
and again included directions about which patches to apply:

http://www.postgresql.org/message-id/CAA4eK1L50Y0Y1OGt_DH2eOUyQ-rQCnPvJBOon2PcGjq+1byi4w@mail.gmail.com

On March 27th, Amit posted version 13 of the sequential scan patch,
which did not include those directions:

http://www.postgresql.org/message-id/CAA4eK1LFR8sR9viUpLPMKRqUVcRhEFDjSz1019rpwgjYftrXeQ@mail.gmail.com

While perhaps Amit might have included directions again, I think it's
pretty reasonable that he felt that it might not be entirely necessary
to do so given that he had already done it twice in the last week.
This was still the state of affairs when you asked your question on
April 20th.  Two days after you asked that question, Amit posted
version 14 of the patch, and again included directions about what
patches to apply:

http://www.postgresql.org/message-id/CAA4eK1JLv+2y1AwjhsQPFisKhBF7jWF_Nzirmzyno9uPBRCpGw@mail.gmail.com

Far from the negligence that you seem to be implying, I think Amit was
remarkably diligent about providing these kinds of updates.  I
admittedly didn't duplicate those same updates on the parallel
mode/contexts thread to which you replied, but that's partly because I
would often whack around that patch first and then Amit would adjust
his patch to cope with my changes after the fact.  That doesn't seem
to have been the case in this particular example, but if this is the
closest thing you can come up with to a process failure during the
development of parallel query, I'm not going to be sad about it: I'm
going to have a beer.  Seriously: we worked really hard at this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Peter Geoghegan
Дата:
On Mon, Feb 8, 2016 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Far from the negligence that you seem to be implying, I think Amit was
> remarkably diligent about providing these kinds of updates.

I don't think I remotely implied negligence. That word has very severe
connotations (think "criminal negligence") that are far from what I
intended.

> I admittedly didn't duplicate those same updates on the parallel
> mode/contexts thread to which you replied, but that's partly because I
> would often whack around that patch first and then Amit would adjust
> his patch to cope with my changes after the fact.  That doesn't seem
> to have been the case in this particular example, but if this is the
> closest thing you can come up with to a process failure during the
> development of parallel query, I'm not going to be sad about it: I'm
> going to have a beer.  Seriously: we worked really hard at this.

I don't want to get stuck on that one example, which I acknowledged
might not be representative when I raised it. I'm not really talking
about parallel query in particular anyway. I'm mostly arguing for a
consistent way to get instructions on how to at least build the patch,
where that might be warranted.

The CF app is one way. Another good way is: As long as we're using a
patch series, be explicit about what goes where in the commit message.
Have message-id references. That sort of thing. I already try to do
that. That's all.

Thank you (and Amit) for working really hard on parallelism.

-- 
Peter Geoghegan



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Feb 8, 2016 at 4:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Feb 8, 2016 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Far from the negligence that you seem to be implying, I think Amit was
>> remarkably diligent about providing these kinds of updates.
>
> I don't think I remotely implied negligence. That word has very severe
> connotations (think "criminal negligence") that are far from what I
> intended.

OK, sorry, I think I misread your tone.

> I don't want to get stuck on that one example, which I acknowledged
> might not be representative when I raised it. I'm not really talking
> about parallel query in particular anyway. I'm mostly arguing for a
> consistent way to get instructions on how to at least build the patch,
> where that might be warranted.
>
> The CF app is one way. Another good way is: As long as we're using a
> patch series, be explicit about what goes where in the commit message.
> Have message-id references. That sort of thing. I already try to do
> that. That's all.

Yeah, me too.  Generally, although with some exceptions, my practice
is to keep reposting the whole patch stack, so that everything is in
one email.  In this particular case, though, there were patches from
me and patches from Amit, so that was harder to do.  I wasn't using
his patches to test my patches; I had other test code for that.  He
was using my patches as a base for his patches, but linked to them
instead of reposting them.  That's an unusually complicated scenario,
though: it's pretty rare around here to have two developers working
together on something as closely as Amit and I did on those patches.

> Thank you (and Amit) for working really hard on parallelism.

Thanks.

By the way, it bears saying, or if I've said it before repeating, that
although most of the parallelism code that has been committed was
written by me, Amit has made an absolutely invaluable contribution to
parallel query, and it wouldn't be committed today or maybe ever
without that contribution.  In addition to those parts of the code
that were committed as he wrote them, he prototyped quite a number of
things that I ended up rewriting, reviewed a ton of code that I wrote
and found bugs in it, wrote numerous bits and pieces of test code, and
generally put up with an absolutely insane level of me nitpicking his
work, breaking it by committing pieces of it or committing different
pieces that replaced pieces he had, demanding repeated rebases on
short time scales, and generally beating him up in just about every
conceivable way.  I am deeply appreciative of him being willing to
jump into this project, do a ton of work, and put up with me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Andres Freund
Дата:
Hi Robert,

On 2016-02-08 13:45:37 -0500, Robert Haas wrote:
> > I realize that this stuff has all been brewing long, and that there's
> > still a lot to do. So you gotta keep moving. And I'm not sure that
> > there's anything wrong or if there's any actually better approach. But
> > pushing an unreviewed, complex patch, that originated in a thread
> > orginally about different relatively small/mundane items, for a
> > contentious issue, a few days after the initial post. Hm. Not sure how
> > you'd react if you weren't the author.
> 
> Probably not very well.  Do you want me to revert it?

No. I want(ed) to express that I am not comfortable with how this got
in. My aim wasn't to generate a flurry of responses with everybody
piling on, or anything like that. But it's unfortunately hard to
avoid. I wish I knew a way, besides only sending private mails. Which I
don't think is a great approach either.

I do agree that we need something to tackle this problem, and that this
quite possibly is the least bad way to do this. And certainly the only
one that's been implemented and posted with any degree of completeness.

But even given the last paragraph, posting a complex new patch in a
somewhat related thread, and then pushing it 5 days later is pretty darn
quick.


> I mean, look.  [explanation why we need the infrastructure].  Do I really
> think anybody was going to spend the time to understand deadlock.c
> well enough to verify my changes?  No, I don't.  What I think would
> have happened is that the patch would have sat around like an
> albatross around my neck - totally ignored by everyone - until the end
> of the last CF, and then the discussion would have gone one of three
> ways:

Yes, believe me, I really get that. It's awfully hard to get substantial
review for pieces of code that require a lot of context.

But I think posting this patch in a new thread, posting a message that
you're intending to commit unless somebody protests with a substantial
arguments and/or a timeline of review, and then waiting a few days, are
something that should be done for a major piece of new infrastructure,
especially when it's somewhat controversial.

This doesn't just affect parallel execution, it affects one of least
understood parts of postgres code. And where hard to find bugs, likely
to only trigger in production, are to be expected.


> And, by the way, the patch, aside from the deadlock.c portion, was
> posted back in October, admittedly without much fanfare, but nobody
> reviewed that or any other patch on this thread.

I think it's unrealistic to expect random patches without a commitest
entry, posted somewhere deep in a thread, to get a review when there's
so many open commitfest entries that haven't gotten feedback, and which
we are supposed to look at.


> If I'd waited for those reviews to come in, parallel query would not
> be committed now, nor probably in 9.6, nor probably in 9.7 or 9.8
> either.  The whole project would just be impossible on its face.

Yes, that's a problem. But you're not the only one facing it, and you've
argued hard against such an approach in some other cases.


> I think it's myopic to say "well, but this patch might have bugs".
> Very true.  But also, all the other parallelism patches that are
> already committed or that are still under review but which can't be
> properly tested without this patch might have bugs, too, so you've got
> to weigh the risk that this patch might get better if I wait longer to
> commit it against the possibility that not having committed it reduces
> the chances of finding bugs elsewhere.  I don't want it to seem like
> I'm forcing this down the community's throat - I don't have a right to
> do that, and I will certainly revert this patch if that is the
> consensus.  But that is not what I think best myself.  What I think
> would be better is to (1) make an effort to get the buildfarm testing
> which this patch enables up and running as soon as possible and (2)
> for somebody to read over the committed code and raise any issues that
> they find.  Or, for that matter, to read over the committed code for
> any of the *other* parallelism patches and raise any issues that they
> find with *that* code.  There's certainly scads of code here and this
> is far from the only bit that might have bugs.

I think you are, and *you have to*, walk a very thin line here. I agree
that realistically there's just nobody with the bandwidth to keep up
with a fully loaded Robert. Not without ignoring their own stuff at
least. And I think the importance of what you're building means we need
to be flexible.  But I think that thin line in turn means that you have
to be *doubly* careful about communication. I.e. post new infrastructure
to new threads, "warn" that you're intending to commit something
potentially needing debate/review, etc.


> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
> reviewed either, though Tom and Alvaro seemed reasonably positive
> about the concept.

I think adding new regression tests should have a barrier to commit
that's about two magnitudes lower than something like group locks. I
mean the worst that they could so is to flap around for some reason, or
take a bit too long. So please please go ahead.

Greetings,

Andres Freund



Re: a raft of parallelism-related bug fixes

От
Andres Freund
Дата:
On 2016-02-08 15:18:13 -0500, Robert Haas wrote:
> I agree that you had to be pretty deeply involved in that thread to
> follow everything that was going on.  But it's not entirely fair to
> say that it was impossible for anyone else to get involved.   Both
> Amit and I, mostly Amit, posted directions at various times saying:
> here is the sequence of patches that you currently need to apply as of
> this time.  There was not a heck of a lot of evidence that anyone was
> doing that, though, though I think a few people did, and towards the
> end things changed very quickly as I committed patches in the series.
> We certainly knew what each other were doing and not because of some
> hidden off-list collaboration that we kept secret from the community -
> we do talk every week, but almost all of our correspondence on those
> patches was on-list.

I think having a public git tree, that contains the current state, is
greatly helpful for that. Just announce that you're going to screw
wildly with history, and that you're not going to be terribly careful
about commit messages.  That means observers can just do a fetch and a
reset --hard to see the absolutely latest and greatest.  By all means
post a series to the list every now and then, but I think for minor
changes it's perfectly sane to say 'pull to see the fixups for the
issues you noticed'.


> I think it's an inherent peril of complicated patch sets that people
> who are not intimately involved in what is going on will have trouble
> following just because it takes a lot of work.

True. But it becomes doubly hard if there's no up-to-date high level
design overview somewhere outside $sizeable_brain. I know it sucks to
write these, believe me. Especially because one definitely feels that
nobody is reading those.

Greetings,

Andres Freund



Re: a raft of parallelism-related bug fixes

От
Peter Geoghegan
Дата:
On Mon, Feb 8, 2016 at 2:35 PM, Andres Freund <andres@anarazel.de> wrote:
> I think having a public git tree, that contains the current state, is
> greatly helpful for that. Just announce that you're going to screw
> wildly with history, and that you're not going to be terribly careful
> about commit messages.  That means observers can just do a fetch and a
> reset --hard to see the absolutely latest and greatest.  By all means
> post a series to the list every now and then, but I think for minor
> changes it's perfectly sane to say 'pull to see the fixups for the
> issues you noticed'.

I would really like for there to be a way to do that more often. It
would be a significant time saver, because it removes problems with
minor bitrot.

-- 
Peter Geoghegan



Re: a raft of parallelism-related bug fixes

От
Robert Haas
Дата:
On Mon, Feb 8, 2016 at 5:27 PM, Andres Freund <andres@anarazel.de> wrote:
>> > contentious issue, a few days after the initial post. Hm. Not sure how
>> > you'd react if you weren't the author.
>>
>> Probably not very well.  Do you want me to revert it?
>
> No. I want(ed) to express that I am not comfortable with how this got
> in. My aim wasn't to generate a flurry of responses with everybody
> piling on, or anything like that. But it's unfortunately hard to
> avoid. I wish I knew a way, besides only sending private mails. Which I
> don't think is a great approach either.
>
> I do agree that we need something to tackle this problem, and that this
> quite possibly is the least bad way to do this. And certainly the only
> one that's been implemented and posted with any degree of completeness.
>
> But even given the last paragraph, posting a complex new patch in a
> somewhat related thread, and then pushing it 5 days later is pretty darn
> quick.

Sorry.  I understand your discomfort, and you're probably right.  I'll
try to handle it better next time.  I think my frustration with the
process got the better of me a little bit here.  This patch may very
well not be perfect, but it's sure as heck better than doing nothing,
and if I'd gone out of my way to say "hey, everybody, here's a patch
that you might want to object to" I'm sure I could have found some
volunteers to do just that.  But, you know, that's not really what I
want.  What I want is somebody to do a detailed review and help me fix
whatever the problems the patch may have.  And ideally, I'd like that
person to understand that you can't have parallel query without doing
something in this area - which I think you do, but certainly not
everybody probably did - and that a lot of simplistic, non-invasive
ideas for how to handle this are going to be utterly inadequate in
complex cases.  Unless you or Noah want to take a hand, I don't expect
to get that sort of review.  Now, that having been said, I think your
frustration with the way I handled it is somewhat justified, and since
you are not arguing for a revert I'm not sure what I can do except try
not to let my frustration get in the way next time.  Which I will try
to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: a raft of parallelism-related bug fixes

От
Andres Freund
Дата:
Hi!

Thanks for the answer. Sounds good.

On 2016-02-08 18:47:18 -0500, Robert Haas wrote:
> and if I'd gone out of my way to say "hey, everybody, here's a patch
> that you might want to object to" I'm sure I could have found some
> volunteers to do just that.  But, you know, that's not really what I
> want.

Sometimes I wonder if three shooting-from-the-hip answers shouldn't cost
a jog around the block or such (of which I'm sometimes guilty as
well!). Wouldn't just help the on-list volume, but also our collective
health ;)

> Unless you or Noah want to take a hand, I don't expect to get that
> sort of review.  Now, that having been said, I think your frustration
> with the way I handled it is somewhat justified, and since you are not
> arguing for a revert I'm not sure what I can do except try not to let
> my frustration get in the way next time.  Which I will try to do.

FWIW, I do hope to put more time into reviewing parallelism stuff in the
coming weeks. It's hard to balance all that one likes to do.

- Andres



postgres_fdw vs. force_parallel_mode on ppc

От
Noah Misch
Дата:
On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote:
> Well, what I've done is push into the buildfarm code that will allow
> us to do *the most exhaustive* testing that I know how to do in an
> automated fashion. Which is to create a file that says this:
>
> force_parallel_mode=regress
> max_parallel_degree=2
>
> And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file
>
> Now, that is not going to find bugs in the deadlock.c portion of the
> group locking patch, but it's been wildly successful in finding bugs
> in other parts of the parallelism code, and there might well be a few
> more that we haven't found yet, which is why I'm hoping that we'll get
> this procedure running regularly either on all buildfarm machines, or
> on some subset of them, or on new animals that just do this.

I configured a copy of animal "mandrill" that way and launched a test run.
The postgres_fdw suite failed as attached.  A manual "make -C contrib
installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
check-world passes everywhere.

Вложения

Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> I configured a copy of animal "mandrill" that way and launched a test run.
> The postgres_fdw suite failed as attached.  A manual "make -C contrib
> installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> check-world passes everywhere.

Hm, is this with or without the ppc-related atomics fix you just found?
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Noah Misch
Дата:
On Mon, Feb 15, 2016 at 06:07:48PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I configured a copy of animal "mandrill" that way and launched a test run.
> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> > check-world passes everywhere.
> 
> Hm, is this with or without the ppc-related atomics fix you just found?

Without those.  The ppc64 GNU/Linux configuration used gcc, though, and the
atomics change affects xlC only.  Also, the postgres_fdw behavior does not
appear probabilistic; it failed twenty times in a row.



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Mon, Feb 15, 2016 at 5:52 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote:
>> Well, what I've done is push into the buildfarm code that will allow
>> us to do *the most exhaustive* testing that I know how to do in an
>> automated fashion. Which is to create a file that says this:
>>
>> force_parallel_mode=regress
>> max_parallel_degree=2
>>
>> And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file
>>
>> Now, that is not going to find bugs in the deadlock.c portion of the
>> group locking patch, but it's been wildly successful in finding bugs
>> in other parts of the parallelism code, and there might well be a few
>> more that we haven't found yet, which is why I'm hoping that we'll get
>> this procedure running regularly either on all buildfarm machines, or
>> on some subset of them, or on new animals that just do this.
>
> I configured a copy of animal "mandrill" that way and launched a test run.
> The postgres_fdw suite failed as attached.  A manual "make -C contrib
> installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> check-world passes everywhere.

Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
test suites.  Is there any reason for that, or is it just kinda where
we ended up?

Retrying it the way you did it, I see the same errors here, so I think
this isn't a PPC-specific problem, but just a problem in general.
I've actually seen these kinds of errors before in earlier versions of
the testing code that eventually became force_parallel_mode.  I got
fooled into believing I'd fixed the problem because of my confusion
about how TEMP_CONFIG worked.  I think this is more likely to be a bug
in force_parallel_mode than a bug in the code that checks whether a
normal parallel query is safe, but I'll have to track it down before I
can say for sure.

Thanks for testing this.  It's not delightful to discover that I
muffed this, but better to find it now than in 6 months.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Noah Misch
Дата:
On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote:
> On Mon, Feb 15, 2016 at 5:52 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Feb 08, 2016 at 02:49:27PM -0500, Robert Haas wrote:
> >> force_parallel_mode=regress
> >> max_parallel_degree=2
> >>
> >> And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file

> > I configured a copy of animal "mandrill" that way and launched a test run.
> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
> > check-world passes everywhere.
> 
> Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
> test suites.  Is there any reason for that, or is it just kinda where
> we ended up?

To my knowledge, it's just the undesirable place we ended up.



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote:
>> Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
>> test suites.  Is there any reason for that, or is it just kinda where
>> we ended up?

> To my knowledge, it's just the undesirable place we ended up.

Yeah.  +1 for fixing that, if it's not unreasonably painful.
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Andrew Dunstan
Дата:

On 02/15/2016 07:57 PM, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
>> On Mon, Feb 15, 2016 at 07:31:40PM -0500, Robert Haas wrote:
>>> Oh, crap.  I didn't realize that TEMP_CONFIG didn't affect the contrib
>>> test suites.  Is there any reason for that, or is it just kinda where
>>> we ended up?
>> To my knowledge, it's just the undesirable place we ended up.
> Yeah.  +1 for fixing that, if it's not unreasonably painful.
>
>             


+1 for fixing it everywhere.

Historical note: back when TEMP_CONFIG was implemented, the main 
regression set was just about the only set of tests the buildfarm ran 
using a temp install. That wasn't even available for contrib and the 
PLs, IIRC.

cheers

andrew






Re: a raft of parallelism-related bug fixes

От
Jim Nasby
Дата:
On 2/8/16 4:39 PM, Peter Geoghegan wrote:
> On Mon, Feb 8, 2016 at 2:35 PM, Andres Freund <andres@anarazel.de> wrote:
>> I think having a public git tree, that contains the current state, is
>> greatly helpful for that. Just announce that you're going to screw
>> wildly with history, and that you're not going to be terribly careful
>> about commit messages.  That means observers can just do a fetch and a
>> reset --hard to see the absolutely latest and greatest.  By all means
>> post a series to the list every now and then, but I think for minor
>> changes it's perfectly sane to say 'pull to see the fixups for the
>> issues you noticed'.
>
> I would really like for there to be a way to do that more often. It
> would be a significant time saver, because it removes problems with
> minor bitrot.

Yeah, I think it's rather silly that we limit ourselves to only pushing 
patches through a mailing list. That's OK (maybe even better) for simple 
stuff, but once there's more than 1 patch it's a PITA.

There's an official github mirror of the code, ISTM it'd be good for 
major features to get posted to github forks in their own branches. I 
think that would also make it easy for buildfarm owners to run tests 
against trusted forks/branches.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: a raft of parallelism-related bug fixes

От
Craig Ringer
Дата:
On 9 February 2016 at 03:00, Joshua D. Drake <jd@commandprompt.com> wrote:
 

I think this further points to the need for more reviewers and less feature pushes. There are fundamental features that we could use, this is one of them. It is certainly more important than say pgLogical or BDR (not that those aren't useful but that we do have external solutions for that problem).


Well, with the pglogical and BDR work most of the work has been along similar lines - getting the infrastructure in place. Commit timestamps, logical decoding, and other features that are useful way beyond pglogical/BDR. Logical decoding in particular is rapidly becoming a really significant feature as people start to see the potential for it in integration and ETL processes.

I'm not sure anyone takes the pglogical downstream submission as a serious attempt at inclusion in 9.6, and even submitting the upstream was significantly a RFC at least as far as 9.6 is concerned. I don't think the downstream submission took any significant time or attention away from other work.

The main result has been useful discussions on remaining pieces needed for DDL replication etc and some greater awareness among others in the community about what's going on in the area. I think that's a generally useful thing.


Oh: another thing that I would like to do is commit the isolation
tests I wrote for the deadlock detector a while back, which nobody has
reviewed either, though Tom and Alvaro seemed reasonably positive
about the concept.  Right now, the deadlock.c part of this patch isn't
tested at all by any of our regression test suites, because NOTHING in
deadlock.c is tested by any of our regression test suites.  You can
blow it up with dynamite and the regression tests are perfectly happy,
and that's pretty scary.

Test test test. Please commit.


Yeah. Enhancing the isolation tests would be useful. Please commit those changes. Even if they broke something in the isolation tester - which isn't likely - forward movement in test infrastructure is important and we should IMO have a lower bar for committing changes there. They won't directly affect code end users are running.
 
I should resurrect Abhijit's patch to allow the isolationtester to talk to multiple servers. We'll want that when we're doing tests like "assert that this change isn't visible on the replica before it becomes visible on the master". (Well, except we violate that one with our funky synchronous_commit implementation...)

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: a raft of parallelism-related bug fixes

От
Amit Langote
Дата:
On 2016/02/18 16:38, Craig Ringer wrote:
> I should resurrect Abhijit's patch to allow the isolationtester to talk to
> multiple servers. We'll want that when we're doing tests like "assert that
> this change isn't visible on the replica before it becomes visible on the
> master". (Well, except we violate that one with our funky
> synchronous_commit implementation...)

How much does (or does not) that overlap with the recovery test suite work
undertaken by Michael et al? I saw some talk of testing for patches in
works on the N synchronous standbys thread.

Thanks,
Amit





Re: a raft of parallelism-related bug fixes

От
Michael Paquier
Дата:
On Thu, Feb 18, 2016 at 5:35 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2016/02/18 16:38, Craig Ringer wrote:
>> I should resurrect Abhijit's patch to allow the isolationtester to talk to
>> multiple servers. We'll want that when we're doing tests like "assert that
>> this change isn't visible on the replica before it becomes visible on the
>> master". (Well, except we violate that one with our funky
>> synchronous_commit implementation...)
>
> How much does (or does not) that overlap with the recovery test suite work
> undertaken by Michael et al? I saw some talk of testing for patches in
> works on the N synchronous standbys thread.

This sounds like poll_query_until in PostgresNode.pm (already on HEAD)
where the query used is something on pg_stat_replication for a given
LSN to see if a standby has reached a given replay position.
-- 
Michael



Re: a raft of parallelism-related bug fixes

От
Craig Ringer
Дата:
On 18 February 2016 at 20:35, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Feb 18, 2016 at 5:35 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2016/02/18 16:38, Craig Ringer wrote:
>> I should resurrect Abhijit's patch to allow the isolationtester to talk to
>> multiple servers. We'll want that when we're doing tests like "assert that
>> this change isn't visible on the replica before it becomes visible on the
>> master". (Well, except we violate that one with our funky
>> synchronous_commit implementation...)
>
> How much does (or does not) that overlap with the recovery test suite work
> undertaken by Michael et al? I saw some talk of testing for patches in
> works on the N synchronous standbys thread.

This sounds like poll_query_until in PostgresNode.pm (already on HEAD)
where the query used is something on pg_stat_replication for a given
LSN to see if a standby has reached a given replay position.

No, it's quite different, though that's something handy to have that I've emulated in the isolationtester using a plpgsql function.

The isolationtester changes in question allow isolationtester specs to run different blocks against different hosts/ports/DBs.

That lets you  make assertions about replication behaviour. It was built for BDR and I think we'll need something along those lines in core if/when any kind of logical replication facilities land, for things like testing failover slots, etc.

The patch is at:


and might be something it's worth having in core as we expand testing of replication, failover, etc.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: a raft of parallelism-related bug fixes

От
Michael Paquier
Дата:
On Thu, Feb 18, 2016 at 9:45 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> That lets you  make assertions about replication behaviour. It was built for
> BDR and I think we'll need something along those lines in core if/when any
> kind of logical replication facilities land, for things like testing
> failover slots, etc.
>
> The patch is at:
>
> http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git;a=commit;h=d859de3b13d39d4eddd91f3e6f316a48d31ee0fe
>
> and might be something it's worth having in core as we expand testing of
> replication, failover, etc.

Maybe there is an advantage to have it, but that's hard to make an
opinion without a complicated test case. Both of those things could
clearly work with each other at first sight. PostgresNode can set up a
set of nodes and this patch would be in charge of more complex
scenarios where the same connection or transaction block is needed.
-- 
Michael



Re: a raft of parallelism-related bug fixes

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Feb 8, 2016 at 2:36 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>> I have no problem running any test cases you wish on a branch in a loop for
>> the next week and reporting back any errors.

> Well, what I've done is push into the buildfarm code that will allow
> us to do *the most exhaustive* testing that I know how to do in an
> automated fashion. Which is to create a file that says this:

> force_parallel_mode=regress
> max_parallel_degree=2

I did a few dozen runs of the core regression tests with those settings
(using current HEAD plus my lockGroupLeaderIdentifier-ectomy patch).
Roughly one time in ten, it fails in the stats test, with diffs as
attached.  I interpret this as meaning that parallel workers don't
reliably transmit stats to the stats collector, though maybe there
is something else happening.

            regards, tom lane

*** /home/postgres/pgsql/src/test/regress/expected/stats.out    Wed Mar  4 00:55:25 2015
--- /home/postgres/pgsql/src/test/regress/results/stats.out    Sun Feb 21 12:59:27 2016
***************
*** 148,158 ****
   WHERE relname like 'trunc_stats_test%' order by relname;
        relname      | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup
  -------------------+-----------+-----------+-----------+------------+------------
!  trunc_stats_test  |         3 |         0 |         0 |          0 |          0
!  trunc_stats_test1 |         4 |         2 |         1 |          1 |          0
!  trunc_stats_test2 |         1 |         0 |         0 |          1 |          0
!  trunc_stats_test3 |         4 |         0 |         0 |          2 |          2
!  trunc_stats_test4 |         2 |         0 |         0 |          0 |          2
  (5 rows)

  SELECT st.seq_scan >= pr.seq_scan + 1,
--- 148,158 ----
   WHERE relname like 'trunc_stats_test%' order by relname;
        relname      | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup
  -------------------+-----------+-----------+-----------+------------+------------
!  trunc_stats_test  |         0 |         0 |         0 |          0 |          0
!  trunc_stats_test1 |         0 |         0 |         0 |          0 |          0
!  trunc_stats_test2 |         0 |         0 |         0 |          0 |          0
!  trunc_stats_test3 |         0 |         0 |         0 |          0 |          0
!  trunc_stats_test4 |         0 |         0 |         0 |          0 |          0
  (5 rows)

  SELECT st.seq_scan >= pr.seq_scan + 1,

======================================================================


Re: postgres_fdw vs. force_parallel_mode on ppc

От
Thomas Munro
Дата:
On Tue, Feb 16, 2016 at 12:12 PM, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Feb 15, 2016 at 06:07:48PM -0500, Tom Lane wrote:
>> Noah Misch <noah@leadboat.com> writes:
>> > I configured a copy of animal "mandrill" that way and launched a test run.
>> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
>> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
>> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
>> > check-world passes everywhere.
>>
>> Hm, is this with or without the ppc-related atomics fix you just found?
>
> Without those.  The ppc64 GNU/Linux configuration used gcc, though, and the
> atomics change affects xlC only.  Also, the postgres_fdw behavior does not
> appear probabilistic; it failed twenty times in a row.

The postgres_fdw failure is a visibility-of-my-own-uncommitted-work
problem.  The first command in a transaction updates a row via an FDW,
and then the SELECT expects to see the effects, but when run in a
background worker it creates a new FDW connection that can't see the
uncommitted UPDATE.

I wonder if parallelism of queries involving an FDW should not be
allowed if your transaction has written through the FDW.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Mon, Feb 22, 2016 at 11:02 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, Feb 16, 2016 at 12:12 PM, Noah Misch <noah@leadboat.com> wrote:
>> On Mon, Feb 15, 2016 at 06:07:48PM -0500, Tom Lane wrote:
>>> Noah Misch <noah@leadboat.com> writes:
>>> > I configured a copy of animal "mandrill" that way and launched a test run.
>>> > The postgres_fdw suite failed as attached.  A manual "make -C contrib
>>> > installcheck" fails the same way on a ppc64 GNU/Linux box, but it passes on
>>> > x86_64 and aarch64.  Since contrib test suites don't recognize TEMP_CONFIG,
>>> > check-world passes everywhere.
>>>
>>> Hm, is this with or without the ppc-related atomics fix you just found?
>>
>> Without those.  The ppc64 GNU/Linux configuration used gcc, though, and the
>> atomics change affects xlC only.  Also, the postgres_fdw behavior does not
>> appear probabilistic; it failed twenty times in a row.
>
> The postgres_fdw failure is a visibility-of-my-own-uncommitted-work
> problem.  The first command in a transaction updates a row via an FDW,
> and then the SELECT expects to see the effects, but when run in a
> background worker it creates a new FDW connection that can't see the
> uncommitted UPDATE.
>
> I wonder if parallelism of queries involving an FDW should not be
> allowed if your transaction has written through the FDW.

Foreign tables are supposed to be categorically excluded from
parallelism.  Not sure why that's not working in this instance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Feb 22, 2016 at 11:02 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> The postgres_fdw failure is a visibility-of-my-own-uncommitted-work
>> problem.  The first command in a transaction updates a row via an FDW,
>> and then the SELECT expects to see the effects, but when run in a
>> background worker it creates a new FDW connection that can't see the
>> uncommitted UPDATE.

> Foreign tables are supposed to be categorically excluded from
> parallelism.  Not sure why that's not working in this instance.

I've not looked at the test case to see if this is exactly what's
going wrong, but it's pretty easy to see how there might be a problem:
consider a STABLE user-defined function that does a SELECT from a foreign
table.  If that function call gets pushed down into a parallel worker
then it would fail as described.
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
>> Foreign tables are supposed to be categorically excluded from
>> parallelism.  Not sure why that's not working in this instance.

BTW, I wonder where you think that's supposed to be enforced, because
I sure can't find any such logic.

I suppose that has_parallel_hazard() would be the logical place to
notice foreign tables, but it currently doesn't even visit RTEs,
much less contain any code to check if their tables are foreign.
Or did you have another place in mind to do that?
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Thomas Munro
Дата:
On Tue, Feb 23, 2016 at 4:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Feb 22, 2016 at 11:02 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> The postgres_fdw failure is a visibility-of-my-own-uncommitted-work
>>> problem.  The first command in a transaction updates a row via an FDW,
>>> and then the SELECT expects to see the effects, but when run in a
>>> background worker it creates a new FDW connection that can't see the
>>> uncommitted UPDATE.
>
>> Foreign tables are supposed to be categorically excluded from
>> parallelism.  Not sure why that's not working in this instance.
>
> I've not looked at the test case to see if this is exactly what's
> going wrong, but it's pretty easy to see how there might be a problem:
> consider a STABLE user-defined function that does a SELECT from a foreign
> table.  If that function call gets pushed down into a parallel worker
> then it would fail as described.

I thought user defined functions were not a problem since it's the
user's responsibility to declare functions' parallel safety correctly.
The manual says: "In general, if a function is labeled as being safe
when it is restricted or unsafe, or if it is labeled as being
restricted when it is in fact unsafe, it may throw errors or produce
wrong answers when used in a parallel query"[1].  Uncommitted changes
on foreign tables are indeed invisible to functions declared as
PARALLEL SAFE, when run with force_parallel_mode = on,
max_parallel_degree = 2, but the default is UNSAFE and in that case
the containing query is never parallelised.  Perhaps the documentation
could use a specific mention of this subtlety with FDWs in the
PARALLEL section?

The case of a plain old SELECT (as seen in the failing regression
test) is definitely a problem though and FDW access there needs to be
detected automatically.  I also thought that
has_parallel_hazard_walker might be the right place for that logic, as
you suggested in your later message.

[1] http://www.postgresql.org/docs/devel/static/sql-createfunction.html

-- 
Thomas Munro
http://www.enterprisedb.com



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Tue, Feb 23, 2016 at 4:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've not looked at the test case to see if this is exactly what's
>> going wrong, but it's pretty easy to see how there might be a problem:
>> consider a STABLE user-defined function that does a SELECT from a foreign
>> table.  If that function call gets pushed down into a parallel worker
>> then it would fail as described.

> I thought user defined functions were not a problem since it's the
> user's responsibility to declare functions' parallel safety correctly.
> The manual says: "In general, if a function is labeled as being safe
> when it is restricted or unsafe, or if it is labeled as being
> restricted when it is in fact unsafe, it may throw errors or produce
> wrong answers when used in a parallel query"[1].

Hm.  I'm not terribly happy with this its-the-users-problem approach to
things, mainly because I have little confidence that somebody couldn't
figure out a security exploit based on it.

> The case of a plain old SELECT (as seen in the failing regression
> test) is definitely a problem though and FDW access there needs to be
> detected automatically.

Yes, the problem we're actually seeing in that regression test is not
dependent on a function wrapper.
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Tue, Feb 23, 2016 at 2:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>>> Foreign tables are supposed to be categorically excluded from
>>> parallelism.  Not sure why that's not working in this instance.
>
> BTW, I wonder where you think that's supposed to be enforced, because
> I sure can't find any such logic.
>
> I suppose that has_parallel_hazard() would be the logical place to
> notice foreign tables, but it currently doesn't even visit RTEs,
> much less contain any code to check if their tables are foreign.
> Or did you have another place in mind to do that?

RTEs are checked in set_rel_consider_parallel(), and I thought there
was a check there related to foreign tables, but there isn't.  Oops.
In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
blindly assume that foreign scans are not parallel-safe, but we can't
blindly assume the opposite either.  Maybe we should assume that the
foreign scan is parallel-safe only if one or more of the new methods
introduced by the aforementioned commit are set, but actually that
doesn't seem quite right.  That would tell us whether the scan itself
can be parallelized, not whether it's safe to run serially but within
a parallel worker.  I think maybe we need a new FDW API that gets
called from set_rel_consider_parallel() with the root, rel, and rte as
arguments and which can return a Boolean.  If the callback is not set,
assume false.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Feb 23, 2016 at 2:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Foreign tables are supposed to be categorically excluded from
>>> parallelism.  Not sure why that's not working in this instance.

>> BTW, I wonder where you think that's supposed to be enforced, because
>> I sure can't find any such logic.

> RTEs are checked in set_rel_consider_parallel(), and I thought there
> was a check there related to foreign tables, but there isn't.  Oops.

Even if there were, it would not fix this bug, because AFAICS the only
thing that set_rel_consider_parallel is chartered to do is set the
per-relation consider_parallel flag.  The failure that is happening in
that regression test with force_parallel_mode turned on happens because
standard_planner plasters a Gather node at the top of the plan, causing
the whole plan including the FDW access to happen inside a parallel
worker.  The only way to prevent that is to clear the
wholePlanParallelSafe flag, which as far as I can tell (not that any of
this is documented worth a damn) isn't something that
set_rel_consider_parallel is supposed to do.

It looks to me like there is a good deal of fuzzy thinking here about the
difference between locally parallelizable and globally parallelizable
plans, ie Gather at the top vs Gather somewhere else.  I also note with
dismay that turning force_parallel_mode on seems to pretty much disable
any testing of local parallelism.

> In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
> blindly assume that foreign scans are not parallel-safe, but we can't
> blindly assume the opposite either.  Maybe we should assume that the
> foreign scan is parallel-safe only if one or more of the new methods
> introduced by the aforementioned commit are set, but actually that
> doesn't seem quite right.  That would tell us whether the scan itself
> can be parallelized, not whether it's safe to run serially but within
> a parallel worker.  I think maybe we need a new FDW API that gets
> called from set_rel_consider_parallel() with the root, rel, and rte as
> arguments and which can return a Boolean.  If the callback is not set,
> assume false.

Meh.  As things stand, postgres_fdw would have to aver that it can't ever
be safely parallelized, which doesn't seem like a very satisfactory answer
even if there are other FDWs that work differently (and which would those
be?  None that use a socket-style connection to an external server.)

The commit you mention above seems to me to highlight the dangers of
accepting hook patches with no working use-case to back them up.
AFAICT it's basically useless for typical FDWs because of this
multiple-connection problem.
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Tue, Feb 23, 2016 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Even if there were, it would not fix this bug, because AFAICS the only
> thing that set_rel_consider_parallel is chartered to do is set the
> per-relation consider_parallel flag.  The failure that is happening in
> that regression test with force_parallel_mode turned on happens because
> standard_planner plasters a Gather node at the top of the plan, causing
> the whole plan including the FDW access to happen inside a parallel
> worker.  The only way to prevent that is to clear the
> wholePlanParallelSafe flag, which as far as I can tell (not that any of
> this is documented worth a damn) isn't something that
> set_rel_consider_parallel is supposed to do.

Hmm.  Well, if you tested it, or looked at the places where
wholePlanParallelSafe is cleared, you would find that it DOES fix the
bug.  create_plan() clears wholePlanParallelSafe if the plan is not
parallel-safe, and the plan won't be parallel-safe unless
consider_parallel was set for the underlying relation.  In case you'd
like to test it for yourself, here's the PoC patch I wrote:

diff --git a/src/backend/optimizer/path/allpaths.c
b/src/backend/optimizer/path/allpaths.c
index bcb668f..8a4179e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -527,6 +527,11 @@ set_rel_consider_parallel(PlannerInfo *root,
RelOptInfo *rel,                                       return;                               return;
  }
 
+
+                       /* Not for foreign tables. */
+                       if (rte->relkind == RELKIND_FOREIGN_TABLE)
+                               return;
+                       break;
               case RTE_SUBQUERY:

Adding that makes the postgres_fdw case pass.

> It looks to me like there is a good deal of fuzzy thinking here about the
> difference between locally parallelizable and globally parallelizable
> plans, ie Gather at the top vs Gather somewhere else.

If you have a specific complaint, I'm happy to try to improve things,
or you can.  I think however that it is also possible that you haven't
fully understood the code I've spent the last year or so developing
yet, possibly because I haven't documented it well enough, but
possibly also because you haven't spent much time looking on it yet.
I'm glad you are, by the way, because I'm sure there are a bunch of
things here that you can improve over what I was able to do,
especially on the planner side of things, and that would be great.
However, a bit of forbearance would be appreciated.

> I also note with
> dismay that turning force_parallel_mode on seems to pretty much disable
> any testing of local parallelism.

No, I don't think so.  It doesn't push a Gather node on top of a plan
that already contains a Gather, because such a plan isn't
parallel_safe.   Nor does it suppress generation of parallel paths
otherwise.

>> In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
>> blindly assume that foreign scans are not parallel-safe, but we can't
>> blindly assume the opposite either.  Maybe we should assume that the
>> foreign scan is parallel-safe only if one or more of the new methods
>> introduced by the aforementioned commit are set, but actually that
>> doesn't seem quite right.  That would tell us whether the scan itself
>> can be parallelized, not whether it's safe to run serially but within
>> a parallel worker.  I think maybe we need a new FDW API that gets
>> called from set_rel_consider_parallel() with the root, rel, and rte as
>> arguments and which can return a Boolean.  If the callback is not set,
>> assume false.
>
> Meh.  As things stand, postgres_fdw would have to aver that it can't ever
> be safely parallelized, which doesn't seem like a very satisfactory answer
> even if there are other FDWs that work differently (and which would those
> be?  None that use a socket-style connection to an external server.)

file_fdw is parallel-safe, and KaiGai posted a patch that makes it
parallel-aware, though that would have needed more work than I'm
willing to put in right now to make it committable.  So in other
words...

> The commit you mention above seems to me to highlight the dangers of
> accepting hook patches with no working use-case to back them up.
> AFAICT it's basically useless for typical FDWs because of this
> multiple-connection problem.

...I didn't ignore this principal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Thomas Munro
Дата:
On Tue, Feb 23, 2016 at 6:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Feb 23, 2016 at 2:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> Foreign tables are supposed to be categorically excluded from
>>>> parallelism.  Not sure why that's not working in this instance.
>>
>> BTW, I wonder where you think that's supposed to be enforced, because
>> I sure can't find any such logic.
>>
>> I suppose that has_parallel_hazard() would be the logical place to
>> notice foreign tables, but it currently doesn't even visit RTEs,
>> much less contain any code to check if their tables are foreign.
>> Or did you have another place in mind to do that?
>
> RTEs are checked in set_rel_consider_parallel(), and I thought there
> was a check there related to foreign tables, but there isn't.  Oops.
> In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
> blindly assume that foreign scans are not parallel-safe, but we can't
> blindly assume the opposite either.  Maybe we should assume that the
> foreign scan is parallel-safe only if one or more of the new methods
> introduced by the aforementioned commit are set, but actually that
> doesn't seem quite right.  That would tell us whether the scan itself
> can be parallelized, not whether it's safe to run serially but within
> a parallel worker.  I think maybe we need a new FDW API that gets
> called from set_rel_consider_parallel() with the root, rel, and rte as
> arguments and which can return a Boolean.  If the callback is not set,
> assume false.

Here is a first pass at that.  The patch adds
IsForeignScanParallelSafe to the FDW API.   postgres_fdw returns false
(unnecessary but useful to verify that the regression test breaks if
you change it to true), and others don't provide the function so fall
back to false.

I suspect there may be opportunities to return true even if snapshots
and uncommitted reads aren't magically coordinated among workers.  For
example: users of MongoDB-type systems and text files have no
expectation of either snapshot semantics or transaction isolation in
the first place, so doing stuff in parallel won't be any less safe
than usual as far as visibility is concerned; postgres_fdw could in
theory export/import snapshots and allow parallelism in limited cases
if it can somehow prove there have been no uncommitted writes; and
non-MVCC/snapshot RDBMSs might be OK in lower isolation levels if you
haven't written anything or have explicitly opted in to uncommitted
reads (otherwise you'd risk invisible deadlock against the leader when
trying to read what it has written).

Please also find attached a tiny patch to respect TEMP_CONFIG for contribs.

--
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: postgres_fdw vs. force_parallel_mode on ppc

От
Thomas Munro
Дата:
On Wed, Feb 24, 2016 at 5:48 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here is a first pass at that. [...]

On Wed, Feb 24, 2016 at 1:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> file_fdw is parallel-safe, ...

And here is a patch to apply on top of the last one, to make file_fdw
return true.  But does it really work correctly under parallelism?

--
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Wed, Feb 24, 2016 at 12:59 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Feb 24, 2016 at 5:48 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Here is a first pass at that. [...]
>
> On Wed, Feb 24, 2016 at 1:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> file_fdw is parallel-safe, ...
>
> And here is a patch to apply on top of the last one, to make file_fdw
> return true.  But does it really work correctly under parallelism?

Seems like it.  Running the regression tests for file_fdw under
force_parallel_mode=regress, max_parallel_degree>0 passes; you can
verify that's actually doing something by using
force_parallel_mode=on, which will result in some predictable
failures.  From a theoretical point of view, there's no reason I can
see why reading a file shouldn't work just as well from a parallel
worker as from the leader.  They both have the same view of the
filesystem, and in neither case are we trying to write any data; we're
just trying to read it.

Committed these patches after revising the comment you wrote and
adding documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Noah Misch
Дата:
On Fri, Feb 26, 2016 at 04:16:58PM +0530, Robert Haas wrote:
> Committed these patches after revising the comment you wrote and
> adding documentation.

I've modified buildfarm member mandrill to use force_parallel_mode=regress and
max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
fails the stats test, like Tom witnessed:
http://www.postgresql.org/message-id/30385.1456077923@sss.pgh.pa.us



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Sat, Feb 27, 2016 at 7:05 PM, Noah Misch <noah@leadboat.com> wrote:
> On Fri, Feb 26, 2016 at 04:16:58PM +0530, Robert Haas wrote:
>> Committed these patches after revising the comment you wrote and
>> adding documentation.
>
> I've modified buildfarm member mandrill to use force_parallel_mode=regress and
> max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
> fails the stats test, like Tom witnessed:
> http://www.postgresql.org/message-id/30385.1456077923@sss.pgh.pa.us

Thank you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> I've modified buildfarm member mandrill to use force_parallel_mode=regress and
> max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
> fails the stats test, like Tom witnessed:
> http://www.postgresql.org/message-id/30385.1456077923@sss.pgh.pa.us

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-03-02%2023%3A34%3A10
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Thu, Mar 3, 2016 at 1:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Noah Misch <noah@leadboat.com> writes:
>> I've modified buildfarm member mandrill to use force_parallel_mode=regress and
>> max_parallel_degree=5; a full run passes.  We'll now see if it intermittently
>> fails the stats test, like Tom witnessed:
>> http://www.postgresql.org/message-id/30385.1456077923@sss.pgh.pa.us
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-03-02%2023%3A34%3A10

A couple of my colleagues have been looking into this.  It's not
entirely clear to me what's going on here yet, but it looks like the
stats get there if you wait long enough.  Rahila Syed was able to
reproduce the problem and says that the attached patch fixes it.  But
I don't quite understand why this should fix it.

BTW, this comment is obsolete:

-- force the rate-limiting logic in pgstat_report_tabstat() to time out
-- and send a message
SELECT pg_sleep(1.0);
 pg_sleep
----------

(1 row)

That function was renamed in commit
93c701edc6c6f065cd25f77f63ab31aff085d6ac, but apparently Tom forgot to
grep for other uses of that identifier name.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> A couple of my colleagues have been looking into this.  It's not
> entirely clear to me what's going on here yet, but it looks like the
> stats get there if you wait long enough.  Rahila Syed was able to
> reproduce the problem and says that the attached patch fixes it.  But
> I don't quite understand why this should fix it.

I don't like this patch much.  While the new function is not bad in
itself, it looks really weird to call it immediately after the other
wait function.  And the reason for that, AFAICT, is that somebody dropped
the entire "truncation stats" test sequence into the middle of unrelated
tests, evidently in the vain hope that that way they could piggyback
on the existing wait.  Which these failures are showing us is wrong.

I think we should move all the inserted logic down so that it's not in the
middle of unrelated testing.

> BTW, this comment is obsolete:

> -- force the rate-limiting logic in pgstat_report_tabstat() to time out
> -- and send a message
> SELECT pg_sleep(1.0);
>  pg_sleep
> ----------

> (1 row)

> That function was renamed in commit
> 93c701edc6c6f065cd25f77f63ab31aff085d6ac, but apparently Tom forgot to
> grep for other uses of that identifier name.

Duh :-(.  Actually, do we need that sleep at all anymore?  Seems like
wait_for_stats ought to cover it.
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Fri, Mar 4, 2016 at 12:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> A couple of my colleagues have been looking into this.  It's not
>> entirely clear to me what's going on here yet, but it looks like the
>> stats get there if you wait long enough.  Rahila Syed was able to
>> reproduce the problem and says that the attached patch fixes it.  But
>> I don't quite understand why this should fix it.
>
> I don't like this patch much.  While the new function is not bad in
> itself, it looks really weird to call it immediately after the other
> wait function.  And the reason for that, AFAICT, is that somebody dropped
> the entire "truncation stats" test sequence into the middle of unrelated
> tests, evidently in the vain hope that that way they could piggyback
> on the existing wait.  Which these failures are showing us is wrong.
>
> I think we should move all the inserted logic down so that it's not in the
> middle of unrelated testing.

Sure.  If you have an idea what the right thing to do is, please go
ahead.  I actually don't have a clear idea what's going on here.  I
guess it's that the wait_for_stats() guarantees that the stats message
from the index insertion has been received but the status messages
from the "trunc" tables might not have gotten there yet.  I thought
maybe that works without parallelism because all of those messages are
coming from the same backend, and therefore if you have the later one
you must have all of the earlier ones, too.  But if you're running
some of the queries in parallel workers then it's possible for a stats
message from a worker run later to arrive later.

But it's not that after all, because when I run the regression tests
with the pg_sleep removed, I get this:

*** /Users/rhaas/pgsql/src/test/regress/expected/stats.out
2016-03-04 08:55:33.000000000 -0500
--- /Users/rhaas/pgsql/src/test/regress/results/stats.out
2016-03-04 09:00:29.000000000 -0500
***************
*** 127,140 ****      1 (1 row)

- -- force the rate-limiting logic in pgstat_report_tabstat() to time out
- -- and send a message
- SELECT pg_sleep(1.0);
-  pg_sleep
- ----------
-
- (1 row)
- -- wait for stats collector to update SELECT wait_for_stats();  wait_for_stats
--- 127,132 ----
***************
*** 148,158 ****  WHERE relname like 'trunc_stats_test%' order by relname;       relname      | n_tup_ins | n_tup_upd |
n_tup_del| n_live_tup
 
| n_dead_tup -------------------+-----------+-----------+-----------+------------+------------
!  trunc_stats_test  |         3 |         0 |         0 |          0
|          0
!  trunc_stats_test1 |         4 |         2 |         1 |          1
|          0
!  trunc_stats_test2 |         1 |         0 |         0 |          1
|          0
!  trunc_stats_test3 |         4 |         0 |         0 |          2
|          2
!  trunc_stats_test4 |         2 |         0 |         0 |          0
|          2 (5 rows)
 SELECT st.seq_scan >= pr.seq_scan + 1,
--- 140,150 ----  WHERE relname like 'trunc_stats_test%' order by relname;       relname      | n_tup_ins | n_tup_upd |
n_tup_del| n_live_tup
 
| n_dead_tup -------------------+-----------+-----------+-----------+------------+------------
!  trunc_stats_test  |         0 |         0 |         0 |          0
|          0
!  trunc_stats_test1 |         0 |         0 |         0 |          0
|          0
!  trunc_stats_test2 |         0 |         0 |         0 |          0
|          0
!  trunc_stats_test3 |         0 |         0 |         0 |          0
|          0
!  trunc_stats_test4 |         0 |         0 |         0 |          0
|          0 (5 rows)
 SELECT st.seq_scan >= pr.seq_scan + 1,
***************
*** 163,169 ****  WHERE st.relname='tenk2' AND cl.relname='tenk2';  ?column? | ?column? | ?column? | ?column?
----------+----------+----------+----------
!  t        | t        | t        | t (1 row)
 SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
--- 155,161 ----  WHERE st.relname='tenk2' AND cl.relname='tenk2';  ?column? | ?column? | ?column? | ?column?
----------+----------+----------+----------
!  f        | f        | f        | f (1 row)
 SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
***************
*** 172,178 ****  WHERE st.relname='tenk2' AND cl.relname='tenk2';  ?column? | ?column? ----------+----------
!  t        | t (1 row)
 SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer
--- 164,170 ----  WHERE st.relname='tenk2' AND cl.relname='tenk2';  ?column? | ?column? ----------+----------
!  t        | f (1 row)
 SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer

That looks suspiciously similar to the failure we're getting with the
force_parallel_mode testing, but I'm still confused.

>> BTW, this comment is obsolete:
>
>> -- force the rate-limiting logic in pgstat_report_tabstat() to time out
>> -- and send a message
>> SELECT pg_sleep(1.0);
>>  pg_sleep
>> ----------
>
>> (1 row)
>
>> That function was renamed in commit
>> 93c701edc6c6f065cd25f77f63ab31aff085d6ac, but apparently Tom forgot to
>> grep for other uses of that identifier name.
>
> Duh :-(.  Actually, do we need that sleep at all anymore?  Seems like
> wait_for_stats ought to cover it.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Sure.  If you have an idea what the right thing to do is, please go
> ahead.

Yeah, I'll modify the patch and commit sometime later today.

> I actually don't have a clear idea what's going on here.  I
> guess it's that the wait_for_stats() guarantees that the stats message
> from the index insertion has been received but the status messages
> from the "trunc" tables might not have gotten there yet.

That's what it looks like to me.  I now think that the apparent
connection to parallel query is a mirage.  The reason we've only
seen a few cases so far is that the flapping test is new: it
wad added in commit d42358efb16cc811, on 20 Feb.  If we left it
as-is, I think we'd eventually see the same failure without forcing
parallel mode.  In fact, that's pretty much what you describe below,
isn't it?  The pg_sleep is sort of half-bakedly substituting for
a proper wait.
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Sure.  If you have an idea what the right thing to do is, please go
> > ahead.
> 
> Yeah, I'll modify the patch and commit sometime later today.
> 
> > I actually don't have a clear idea what's going on here.  I
> > guess it's that the wait_for_stats() guarantees that the stats message
> > from the index insertion has been received but the status messages
> > from the "trunc" tables might not have gotten there yet.
> 
> That's what it looks like to me.  I now think that the apparent
> connection to parallel query is a mirage.  The reason we've only
> seen a few cases so far is that the flapping test is new: it
> wad added in commit d42358efb16cc811, on 20 Feb.  If we left it
> as-is, I think we'd eventually see the same failure without forcing
> parallel mode.  In fact, that's pretty much what you describe below,
> isn't it?  The pg_sleep is sort of half-bakedly substituting for
> a proper wait.

It was added on Feb 20 all right, but of *last year*.  It's been there
working happily for a year now.

The reason I added the trunc test in the middle of the index update
tests is that I dislike tests that sleep for long without real purpose;
it seems pretty reasonable to me to have both sleeps actually be the
same wait.

Instead of adding another sleep function, another possibility is to add
two booleans, one for the index counter and another for the truncate
counters, and only terminate the sleep if both are true.  I don't see
any reason to make this test any slower than it already is.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Fri, Mar 4, 2016 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Sure.  If you have an idea what the right thing to do is, please go
>> ahead.
>
> Yeah, I'll modify the patch and commit sometime later today.

OK, if you're basing that on the patch I sent upthread, please credit
Rahila Syed as the original author of that code.  (I modified it
before posting, but only trivially.)  Of course if you do something
else, then never mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> That's what it looks like to me.  I now think that the apparent
>> connection to parallel query is a mirage.  The reason we've only
>> seen a few cases so far is that the flapping test is new: it
>> wad added in commit d42358efb16cc811, on 20 Feb.

> It was added on Feb 20 all right, but of *last year*.  It's been there
> working happily for a year now.

Wup, you're right, failed to look closely enough at the commit log
entry.  So that puts us back to wondering why exactly parallel query
is triggering this.  Still, Robert's experiment with removing the
pg_sleep seems fairly conclusive: it is possible to get the failure
without parallel query.

> Instead of adding another sleep function, another possibility is to add
> two booleans, one for the index counter and another for the truncate
> counters, and only terminate the sleep if both are true.  I don't see
> any reason to make this test any slower than it already is.

Well, that would make the function more complicated, but maybe it's a
better answer.  On the other hand, we know that the stats updates are
delivered in a deterministic order, so why not simply replace the
existing test in the wait function with one that looks for the truncation
updates?  If we've gotten those, we must have gotten the earlier ones.

In any case, the real answer to making the test less slow is to get rid of
that vestigial pg_sleep.  I'm wondering why we failed to remove that when
we put in the wait_for_stats function...
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> That's what it looks like to me.  I now think that the apparent
>>> connection to parallel query is a mirage.  The reason we've only
>>> seen a few cases so far is that the flapping test is new: it
>>> wad added in commit d42358efb16cc811, on 20 Feb.
>
>> It was added on Feb 20 all right, but of *last year*.  It's been there
>> working happily for a year now.
>
> Wup, you're right, failed to look closely enough at the commit log
> entry.  So that puts us back to wondering why exactly parallel query
> is triggering this.  Still, Robert's experiment with removing the
> pg_sleep seems fairly conclusive: it is possible to get the failure
> without parallel query.
>
>> Instead of adding another sleep function, another possibility is to add
>> two booleans, one for the index counter and another for the truncate
>> counters, and only terminate the sleep if both are true.  I don't see
>> any reason to make this test any slower than it already is.
>
> Well, that would make the function more complicated, but maybe it's a
> better answer.  On the other hand, we know that the stats updates are
> delivered in a deterministic order, so why not simply replace the
> existing test in the wait function with one that looks for the truncation
> updates?  If we've gotten those, we must have gotten the earlier ones.

I'm not sure if that's actually true with parallel mode.  I'm pretty
sure the earlier workers will have terminated before the later ones
start, but is that enough to guarantee that the stats collector sees
the messages in that order?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Alvaro Herrera
Дата:
Robert Haas wrote:

> I'm not sure if that's actually true with parallel mode.  I'm pretty
> sure the earlier workers will have terminated before the later ones
> start, but is that enough to guarantee that the stats collector sees
> the messages in that order?

Um.  So if you have two queries that run in sequence, it's possible
for workers of the first query to be still running when workers for the
second query finish?  That would be very strange.

If that's not what you're saying, I don't understand what guarantees you
say we don't have.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, that would make the function more complicated, but maybe it's a
>> better answer.  On the other hand, we know that the stats updates are
>> delivered in a deterministic order, so why not simply replace the
>> existing test in the wait function with one that looks for the truncation
>> updates?  If we've gotten those, we must have gotten the earlier ones.

> I'm not sure if that's actually true with parallel mode.  I'm pretty
> sure the earlier workers will have terminated before the later ones
> start, but is that enough to guarantee that the stats collector sees
> the messages in that order?

Huh?  Parallel workers are read-only; what would they be doing sending
any of these messages?
        regards, tom lane



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Robert Haas
Дата:
On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, that would make the function more complicated, but maybe it's a
>>> better answer.  On the other hand, we know that the stats updates are
>>> delivered in a deterministic order, so why not simply replace the
>>> existing test in the wait function with one that looks for the truncation
>>> updates?  If we've gotten those, we must have gotten the earlier ones.
>
>> I'm not sure if that's actually true with parallel mode.  I'm pretty
>> sure the earlier workers will have terminated before the later ones
>> start, but is that enough to guarantee that the stats collector sees
>> the messages in that order?
>
> Huh?  Parallel workers are read-only; what would they be doing sending
> any of these messages?

Mumble.  I have no idea what's happening here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw vs. force_parallel_mode on ppc

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Huh?  Parallel workers are read-only; what would they be doing sending
>> any of these messages?

> Mumble.  I have no idea what's happening here.

OK, after inserting a bunch of debug logging I have figured out what is
happening.  The updates on trunc_stats_test et al, being updates, are
done in the session's main backend.  But we also have these queries:

-- do a seqscan
SELECT count(*) FROM tenk2;
-- do an indexscan
SELECT count(*) FROM tenk2 WHERE unique1 = 1;

These can be, and are, done in parallel worker processes (and not
necessarily the same one, either).  AFAICT, the parallel worker
processes send their stats messages to the stats collector more or
less immediately after processing their queries.  However, because
of the rate-limiting logic in pgstat_report_stat, the main backend
doesn't.  The point of that "pg_sleep(1.0)" (which was actually added
*after* wait_for_stats) is to ensure that the half-second delay in
the rate limiter has been soaked up, and the stats messages sent,
before we start waiting for the results to become visible in the
stats collector's output.

So the sequence of events when we get a failure looks like

1. parallel workers send stats updates for seqscan and indexscan
on tenk2.

2. stats collector emits output files, probably as a result of
an autovacuum request.

3. session's main backend finishes "pg_sleep(1.0)" and sends
stats updates for what it's done lately, including the
updates on trunc_stats_test et al.

4. wait_for_stats() observes that the tenk2 idx_scan count has
already advanced and figures it need not wait at all.

5. We print stale stats for trunc_stats_test et al.

So it appears to me that to make this robust, we need to adjust
wait_for_stats to verify advances on *all three of* the tenk2
seq_scan count, the tenk2 idx_scan count, and at least one of
the trunc_stats_test tables' counters, because those could be
coming from three different backend processes.

If we ever allow parallel workers to do writes, this will really
become a mess.
        regards, tom lane