Обсуждение: PGOPTIONS="-fh" make check gets stuck since Postgres 11

Поиск
Список
Период
Сортировка

PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Michael Paquier
Дата:
Hi all,

I have begun playing with regressplans.sh which enforces various
combinations of "-f s|i|n|m|h" when running the regression tests, and
I have noticed that -fh can cause the server to become stuck in the
test join_hash.sql with this query (not sure which portion of the SET
LOCAL parameters are involved) :
select count(*) from simple r join extremely_skewed s using (id);

This does not happen with REL_10_STABLE where the test executes
immediately, so we has visibly an issue caused by v11 here.

Any thoughts?
--
Michael

Вложения

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Thomas Munro
Дата:
On Mon, Jul 8, 2019 at 5:53 PM Michael Paquier <michael@paquier.xyz> wrote:
> I have begun playing with regressplans.sh which enforces various
> combinations of "-f s|i|n|m|h" when running the regression tests, and
> I have noticed that -fh can cause the server to become stuck in the
> test join_hash.sql with this query (not sure which portion of the SET
> LOCAL parameters are involved) :
> select count(*) from simple r join extremely_skewed s using (id);
>
> This does not happen with REL_10_STABLE where the test executes
> immediately, so we has visibly an issue caused by v11 here.

If you don't allow hash joins it makes this plan:

 Aggregate
   ->  Nested Loop
         Join Filter: (r.id = s.id)
         ->  Seq Scan on simple r
         ->  Materialize
               ->  Seq Scan on extremely_skewed s

"simple" has 20k rows and "extremely_skewed" has 20k rows but the
planner thinks it only has 2.  So this going to take O(n^2) time and n
is 20k.  Not sure what to do about that.  Maybe "join_hash" should be
skipped for the -h (= no hash joins please) case?

-- 
Thomas Munro
https://enterprisedb.com



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Michael Paquier
Дата:
On Mon, Jul 08, 2019 at 06:49:44PM +1200, Thomas Munro wrote:
> "simple" has 20k rows and "extremely_skewed" has 20k rows but the
> planner thinks it only has 2.  So this going to take O(n^2) time and n
> is 20k.  Not sure what to do about that.  Maybe "join_hash" should be
> skipped for the -h (= no hash joins please) case?

Ah, thanks.  Yes that's going to take a while :)

Well, another thing I'd like to think about is if there is any point
to keep regressplans.sh and the relevant options in postgres at this
stage.  From the top of the file one can read that:
# This script runs the Postgres regression tests with all useful combinations
# of the backend options that disable various query plan types.  If the
# results are not all the same, it may indicate a bug in a particular
# plan type, or perhaps just a regression test whose results aren't fully
# determinate (eg, due to lack of an ORDER BY keyword).

However if you run any option with make check, then in all runs there
are tests failing.  We can improve the situation for some of them with
ORDER BY queries by looking at the query outputs, but some EXPLAIN
queries are sensitive to that, and the history around regressplans.sh
does not play in favor of it (some people really use it?).  If you
look at the latest commits for it, it has not been really touched in
19 years.

So I would be rather in favor in nuking it.
--
Michael

Вложения

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> I have begun playing with regressplans.sh which enforces various
> combinations of "-f s|i|n|m|h" when running the regression tests, and
> I have noticed that -fh can cause the server to become stuck in the
> test join_hash.sql with this query (not sure which portion of the SET
> LOCAL parameters are involved) :
> select count(*) from simple r join extremely_skewed s using (id);

> This does not happen with REL_10_STABLE where the test executes
> immediately, so we has visibly an issue caused by v11 here.

Yeah, these test cases were added by fa330f9ad in v11.

What it looks like to me is that some of these test cases force
"enable_mergejoin = off", so if you also have enable_hashjoin off then
you are going to get a nestloop plan, and it's hardly surprising that
that takes an unreasonable amount of time on the rather large test
tables used in these tests.

Given the purposes of this test, I think it'd be reasonable to force
both enable_hashjoin = on and enable_mergejoin = off at the very top
of the join_hash script, or the corresponding place in join.sql in
v11.  Thomas, was there a specific reason for forcing enable_mergejoin
= off for only some of these tests?

            regards, tom lane



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> Well, another thing I'd like to think about is if there is any point
> to keep regressplans.sh and the relevant options in postgres at this
> stage.  From the top of the file one can read that:

The point of regressplans.sh is to see if anything goes seriously
wrong when forcing non-default plan choices --- seriously wrong being
defined as crashes or semantically wrong answers.  It's not expected
that the regression tests will automatically pass when you do that,
because of their dependencies on output row ordering, not to mention
all the EXPLAINs.  I'm not for removing it --- the fact that its
results require manual evaluation doesn't make it useless.

Having said that, join_hash.sql in particular seems to have zero
value if it's not testing hash joins, so I think it'd be reasonable
for it to override a global enable_hashjoin = off setting.  None of
the other regression test scripts seem to take nearly as much of a
performance hit from globally forcing poor plans.

            regards, tom lane



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Michael Paquier
Дата:
On Mon, Jul 08, 2019 at 03:21:41PM -0400, Tom Lane wrote:
> Having said that, join_hash.sql in particular seems to have zero
> value if it's not testing hash joins, so I think it'd be reasonable
> for it to override a global enable_hashjoin = off setting.  None of
> the other regression test scripts seem to take nearly as much of a
> performance hit from globally forcing poor plans.

I am a bit confused here.  Don't you mean to have enable_hashjoin =
*on* at the top of hash_join.sql instead like in the attached?
--
Michael

Вложения

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Thomas Munro
Дата:
On Tue, Jul 9, 2019 at 2:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Given the purposes of this test, I think it'd be reasonable to force
> both enable_hashjoin = on and enable_mergejoin = off at the very top
> of the join_hash script, or the corresponding place in join.sql in
> v11.  Thomas, was there a specific reason for forcing enable_mergejoin
> = off for only some of these tests?

Based on a suggestion from Andres (if I recall correctly), I wrapped
each individual test in savepoint/rollback, and then set just the GUCs
needed to get the plan shape and execution code path I wanted to
exercise, and I guess I found that I only needed to disable merge
joins for some of them.  The idea was that the individual tests could
be understood independently.

-- 
Thomas Munro
https://enterprisedb.com



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Jul 08, 2019 at 03:21:41PM -0400, Tom Lane wrote:
>> Having said that, join_hash.sql in particular seems to have zero
>> value if it's not testing hash joins, so I think it'd be reasonable
>> for it to override a global enable_hashjoin = off setting.  None of
>> the other regression test scripts seem to take nearly as much of a
>> performance hit from globally forcing poor plans.

> I am a bit confused here.  Don't you mean to have enable_hashjoin =
> *on* at the top of hash_join.sql instead like in the attached?

Right, overriding any enable_hashjoin = off that might've come from
PGOPTIONS or wherever.

            regards, tom lane



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Tue, Jul 9, 2019 at 2:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Given the purposes of this test, I think it'd be reasonable to force
>> both enable_hashjoin = on and enable_mergejoin = off at the very top
>> of the join_hash script, or the corresponding place in join.sql in
>> v11.  Thomas, was there a specific reason for forcing enable_mergejoin
>> = off for only some of these tests?

> Based on a suggestion from Andres (if I recall correctly), I wrapped
> each individual test in savepoint/rollback, and then set just the GUCs
> needed to get the plan shape and execution code path I wanted to
> exercise, and I guess I found that I only needed to disable merge
> joins for some of them.  The idea was that the individual tests could
> be understood independently.

But per this discussion, they can only be "understood independently"
if you make some assumptions about the prevailing values of the
planner GUCs.

            regards, tom lane



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Thomas Munro
Дата:
On Tue, Jul 9, 2019 at 2:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Based on a suggestion from Andres (if I recall correctly), I wrapped
> > each individual test in savepoint/rollback, and then set just the GUCs
> > needed to get the plan shape and execution code path I wanted to
> > exercise, and I guess I found that I only needed to disable merge
> > joins for some of them.  The idea was that the individual tests could
> > be understood independently.
>
> But per this discussion, they can only be "understood independently"
> if you make some assumptions about the prevailing values of the
> planner GUCs.

Yeah.  I had obviously never noticed that test script.  +1 for just
enabling hash joins the top of join_hash.sql in 12+, and the
equivalent section in 11's join.sql (which is luckily at the end of
the file).

-- 
Thomas Munro
https://enterprisedb.com



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Michael Paquier
Дата:
On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:
> Yeah.  I had obviously never noticed that test script.  +1 for just
> enabling hash joins the top of join_hash.sql in 12+, and the
> equivalent section in 11's join.sql (which is luckily at the end of
> the file).

Right, I did not pay much attention to REL_11_STABLE.  In this case
the test begins around line 2030 and reaches the bottom of the file.
I would actually add a RESET at the bottom of it to avoid any tests to
be impacted, as usually bug-fix tests are just appended.  Thomas,
perhaps you would prefer fixing it yourself?  Or should I?
--
Michael

Вложения

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:
>> Yeah.  I had obviously never noticed that test script.  +1 for just
>> enabling hash joins the top of join_hash.sql in 12+, and the
>> equivalent section in 11's join.sql (which is luckily at the end of
>> the file).

> Right, I did not pay much attention to REL_11_STABLE.  In this case
> the test begins around line 2030 and reaches the bottom of the file.
> I would actually add a RESET at the bottom of it to avoid any tests to
> be impacted, as usually bug-fix tests are just appended.

Agreed that the scope should be limited.  But in 12/HEAD, I think the
relevant tests are all wrapped into one transaction block, so that
using SET LOCAL would be enough.  Not sure if 11 is the same.

            regards, tom lane



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Thomas Munro
Дата:
On Tue, Jul 9, 2019 at 2:45 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:
> > Yeah.  I had obviously never noticed that test script.  +1 for just
> > enabling hash joins the top of join_hash.sql in 12+, and the
> > equivalent section in 11's join.sql (which is luckily at the end of
> > the file).
>
> Right, I did not pay much attention to REL_11_STABLE.  In this case
> the test begins around line 2030 and reaches the bottom of the file.
> I would actually add a RESET at the bottom of it to avoid any tests to
> be impacted, as usually bug-fix tests are just appended.  Thomas,
> perhaps you would prefer fixing it yourself?  Or should I?

It's my mistake.  I'll fix it later today.

-- 
Thomas Munro
https://enterprisedb.com



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Michael Paquier
Дата:
On Tue, Jul 09, 2019 at 03:04:27PM +1200, Thomas Munro wrote:
> It's my mistake.  I'll fix it later today.

Thanks!
--
Michael

Вложения

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Melanie Plageman
Дата:


On Mon, Jul 8, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The point of regressplans.sh is to see if anything goes seriously
wrong when forcing non-default plan choices --- seriously wrong being
defined as crashes or semantically wrong answers.  It's not expected
that the regression tests will automatically pass when you do that,
because of their dependencies on output row ordering, not to mention
all the EXPLAINs.  I'm not for removing it --- the fact that its
results require manual evaluation doesn't make it useless.


It might be worth post-processing results files to ignore row ordering
in some cases to allow for easier comparison. Has this been proposed
in the past?

--
Melanie Plageman

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Michael Paquier
Дата:
On Tue, Jul 09, 2019 at 11:54:29AM -0700, Melanie Plageman wrote:
> It might be worth post-processing results files to ignore row ordering
> in some cases to allow for easier comparison. Has this been proposed
> in the past?

Not that I recall.
--
Michael

Вложения

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Amit Kapila
Дата:
On Wed, Jul 10, 2019 at 10:12 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jul 09, 2019 at 11:54:29AM -0700, Melanie Plageman wrote:
> > It might be worth post-processing results files to ignore row ordering
> > in some cases to allow for easier comparison. Has this been proposed
> > in the past?
>
> Not that I recall.
>

It would be good if we can come up with something like that.  It will
be helpful for zheap, where in some cases we get different row
ordering due to in-place updates.  As of now, we try to add Order By
or do some extra magic to get consistent row ordering.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Michael Paquier
Дата:
On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:
> It would be good if we can come up with something like that.  It will
> be helpful for zheap, where in some cases we get different row
> ordering due to in-place updates.  As of now, we try to add Order By
> or do some extra magic to get consistent row ordering.

That was an issue for me as well when working with Postgres-XC when
the row ordering was not guaranteed depending on the number of nodes
(speaking of which Greenplum has the same issues, no?).  Adding ORDER
BY clauses to a set of tests may make sense, but then this may impact
the plans generated for some of them..
--
Michael

Вложения

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Melanie Plageman
Дата:


On Wed, Jul 10, 2019 at 12:40 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:
> It would be good if we can come up with something like that.  It will
> be helpful for zheap, where in some cases we get different row
> ordering due to in-place updates.  As of now, we try to add Order By
> or do some extra magic to get consistent row ordering.

That was an issue for me as well when working with Postgres-XC when
the row ordering was not guaranteed depending on the number of nodes
(speaking of which Greenplum has the same issues, no?).  Adding ORDER
BY clauses to a set of tests may make sense, but then this may impact
the plans generated for some of them..
--
Michael

We have a tool that does this. gpdiff [1] is used for results post-processing
and it uses a perl module called atmsort [2] to deal with the specific ORDER BY
case discussed here.

[1] https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/gpdiff.pl
[2] https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/atmsort.pl

--
Melanie Plageman

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:
>> It would be good if we can come up with something like that.  It will
>> be helpful for zheap, where in some cases we get different row
>> ordering due to in-place updates.  As of now, we try to add Order By
>> or do some extra magic to get consistent row ordering.

> That was an issue for me as well when working with Postgres-XC when
> the row ordering was not guaranteed depending on the number of nodes
> (speaking of which Greenplum has the same issues, no?).  Adding ORDER
> BY clauses to a set of tests may make sense, but then this may impact
> the plans generated for some of them..

Yeah, I do not want to get into a situation where we can't test
queries that lack ORDER BY.  Also, the fact that tableam X doesn't
reproduce heap's row ordering is not a good reason to relax the
strength of the tests for heap.  So I'm wondering about some
postprocessing that we could optionally apply.  Perhaps the tools
Melanie mentions could help.

            regards, tom lane



Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Ashwin Agrawal
Дата:

On Wed, Jul 10, 2019 at 6:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:
>> It would be good if we can come up with something like that.  It will
>> be helpful for zheap, where in some cases we get different row
>> ordering due to in-place updates.  As of now, we try to add Order By
>> or do some extra magic to get consistent row ordering.

> That was an issue for me as well when working with Postgres-XC when
> the row ordering was not guaranteed depending on the number of nodes
> (speaking of which Greenplum has the same issues, no?).  Adding ORDER
> BY clauses to a set of tests may make sense, but then this may impact
> the plans generated for some of them..

Yeah, I do not want to get into a situation where we can't test
queries that lack ORDER BY.  Also, the fact that tableam X doesn't
reproduce heap's row ordering is not a good reason to relax the
strength of the tests for heap.  So I'm wondering about some
postprocessing that we could optionally apply.  Perhaps the tools
Melanie mentions could help.

Surprisingly, I have been working from a couple of days to use those
Perl tools from Greenplum for Zedstore. As for Zedstore plans differ
for many regress tests because relation size not being the same as
heap and all. Also, for similar reasons, row orders change as
well. So, to effectively use the test untouched to validate Zedstore
and yes was thinking will help Zheap testing as well. I also tested
the same for regressplans.sh and it will lift a lot of manual burden
of investigating the results. As one can specify to completely ignore
explain plan outputs from the comparison between results and
expected. Will post patch for the tool, once I get in little decent
shape.

Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

От
Michael Paquier
Дата:
On Wed, Jul 10, 2019 at 09:11:41AM -0700, Ashwin Agrawal wrote:
> Will post patch for the tool, once I get in little decent shape.

That would be nice!  We may be able to get something into v13 this way
then.
--
Michael

Вложения