Обсуждение: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)

Поиск
Список
Период
Сортировка

Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)

От
Greg Sabino Mullane
Дата:
I don't have a full test case yet, but I did finally manage to get an
explain analyze to finish in a sane amount of time on 8.2.5. Attached
are two cleaned up explain analyze results, using the exact same data
directory but different executables: one is 8.2.3 and returns as
expected, the other is 8.2.5, which generates a slow plan despite any
fiddling with geqo/join_collapse_limit, etc. The cost is the same, but
it makes a wrong turn partway through the plan. This 8.2.5 has the
earlier patches from Tom already posted applied to it (also tested on
8.2.4 and 8.2.5 with the same result). An earlier version of the query
with tables truncated to 100000 rows ran in 70 seconds on 8.2.3, and did
not finish after an hour on 8.2.5. This version has the tables truncated
to a mere 10000 rows each. The query itself is a view calling some large
views, which call other views and functions, etc. I can post a version
of it if needed, but my energy is mostly focused now on making a
reproducible test case. Now that this is locally reproducible in a
finite amount of time, patches and tweaking suggestions are welcome.
(Postgres built with no special flags from source, all tables have been
analyzed, '***' in the explain analyze plans indicates places manually
made things more readable).


Вложения
Greg Sabino Mullane <greg@turnstep.com> writes:
> I don't have a full test case yet, but I did finally manage to get an
> explain analyze to finish in a sane amount of time on 8.2.5. Attached
> are two cleaned up explain analyze results, using the exact same data
> directory but different executables: one is 8.2.3 and returns as
> expected, the other is 8.2.5, which generates a slow plan despite any
> fiddling with geqo/join_collapse_limit, etc. The cost is the same, but
> it makes a wrong turn partway through the plan.

Is there a reason you rounded off most of the costs?  It looks like the
estimated costs of the two join types are nearly equal, and so it's pure
chance which one gets chosen.  The real problem seems to be the
misestimation here:

>                  ->Seq Scan on orders_smaller m (C=0..742 R=1) (AT=0..11 R=9247 L=1)
>                    Filter: ((order_number)::text !~~ '%.%'::text)

With a base scan estimate that's off by four orders of magnitude,
there's no reason at all to expect that the join plan above it will
be very suitable :-(

This might be a bug in the LIKE estimator (if so, it's in 8.2.3 as
well).  I don't have time to look closer right now, but can you show us
the pg_stats row for orders_smaller.order_number?

            regards, tom lane
I wrote:
> This might be a bug in the LIKE estimator (if so, it's in 8.2.3 as
> well).  I don't have time to look closer right now, but can you show us
> the pg_stats row for orders_smaller.order_number?

Oh, never mind that ... on inspection, the NOT LIKE selectivity
estimator is obviously broken: patternsel() doesn't realize it needs
to negate the results of the operator when applying same to histogram
entries.

            regards, tom lane
I wrote:
>> This might be a bug in the LIKE estimator (if so, it's in 8.2.3 as
>> well).  I don't have time to look closer right now, but can you show us
>> the pg_stats row for orders_smaller.order_number?

> Oh, never mind that ... on inspection, the NOT LIKE selectivity
> estimator is obviously broken: patternsel() doesn't realize it needs
> to negate the results of the operator when applying same to histogram
> entries.

I've applied a patch that should make this better:
http://archives.postgresql.org/pgsql-committers/2007-11/msg00101.php

            regards, tom lane

Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)

От
"Greg Sabino Mullane"
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Is there a reason you rounded off most of the costs?  It looks like the
> estimated costs of the two join types are nearly equal, and so it's pure
> chance which one gets chosen.

No real reason, it's just a post-processing script used to make explain
output a little more readable. I'll leave in all the sigfigs next time.

> This might be a bug in the LIKE estimator (if so, it's in 8.2.3 as
> well).  I don't have time to look closer right now, but can you show us
> the pg_stats row for orders_smaller.order_number?

I tried the patch you sent, with no change. However, I then changed the
default_statistics_target to 100, reanalyzed, and it came back with the
"good" plan. Trying this on the original larger query (which pulls from
tables with millions of rows, not the 10,000 subsets I created) worked
fine too. Very odd. I confirmed that a lower stats worked fine on 8.2.3,
and then narrowed it down to 99 - at 98 and below on 8.2.5, the "bad"
plan is used, and at 99 and above, the "good" one is. I guess as a rule
of thumb we'll crank up the default estimate on our 8.2 boxes. Any particular
number recommended? Any reason why 99 is so magical? I could have sworn I
tried it with 100 last week and saw the bad plan. Guess I should also
boost my default target testing up a bit as well. Thanks for the quick patch,
we'll definitely apply that as well for safety.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200711081137
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFHMztMvJuQZxSWSsgRA0pfAKDHWvUafv0bwL/nzmP5yXuptTPX7gCfbMNr
uMLI9yy6Prwt0DOHBsLu/Pk=
=1Vsj
-----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes:
> I tried the patch you sent, with no change. However, I then changed the
> default_statistics_target to 100, reanalyzed, and it came back with the
> "good" plan. Trying this on the original larger query (which pulls from
> tables with millions of rows, not the 10,000 subsets I created) worked
> fine too. Very odd. I confirmed that a lower stats worked fine on 8.2.3,
> and then narrowed it down to 99 - at 98 and below on 8.2.5, the "bad"
> plan is used, and at 99 and above, the "good" one is. I guess as a rule
> of thumb we'll crank up the default estimate on our 8.2 boxes. Any particular
> number recommended? Any reason why 99 is so magical?

If there are 100 or more histogram entries it'll do the estimation by
counting how many of the histogram entries match the pattern, rather
than using the prefix-range-based estimator (which is pretty much
all-fantasy anyway for a pattern with leading % :-().

http://archives.postgresql.org/pgsql-committers/2006-09/msg00331.php

I'm too lazy to go check, but I think the stats target is interpreted as
the number of histogram bins rather than values, which is why you'd see
the switchover at 99 not 100.

> I could have sworn I
> tried it with 100 last week and saw the bad plan.

Without that patch, 8.2.x's NOT LIKE estimator is completely bogus
anyway :-(.  It has been broken right along --- I'm not sure why
your query produced a different plan in 8.2.3 than later, but it
wasn't as a result of changes here.

            regards, tom lane

Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)

От
"Greg Sabino Mullane"
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Tom Lane replied:
> If there are 100 or more histogram entries it'll do the estimation by
> counting how many of the histogram entries match the pattern, rather
> than using the prefix-range-based estimator (which is pretty much
> all-fantasy anyway for a pattern with leading % :-().
>
> http://archives.postgresql.org/pgsql-committers/2006-09/msg00331.php

Ugh, that's some gotcha. Now that the patches are in place to fix the
planner problems, can I strongly recommend that a 8.2.6 version be made?
These are some serious planner problems, reported by real-world users,
and the only other option may be to go back to 8.1.

> I'm too lazy to go check, but I think the stats target is interpreted as
> the number of histogram bins rather than values, which is why you'd see
> the switchover at 99 not 100.

Can we switch the default_statistics_target to a default of 100? Is
there any reason not to do so at this point? Ten has always seemed
very low to me, and the 98/99 gotcha only makes the case for at
least 100 as the default stronger.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200711132325
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFHOnkKvJuQZxSWSsgRA9i/AJ4rN3BANdWKLmrscVpij0GiZ1i/lwCg4u4x
6U+7bTe2o60Kv44f+6n61Zc=
=MSBf
-----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes:
> Can we switch the default_statistics_target to a default of 100? Is
> there any reason not to do so at this point?

Other than a 10x increase in the cost of ANALYZE, and in the cost of
histogram-based operations in the planner?

It's been clear for quite awhile that a stats target of 10 is often
too low, but no one has done the legwork to establish what a more
reasonable tradeoff point would be.  I'm not for increasing the overhead
by a factor of 10 ... or even a factor of 2 ... without some real
evidence about the implications.

I'd be inclined to approach it by first trying to establish what's a
sane default for operations unrelated to LIKE estimation.  If that comes
out as 50 or more, I'd just lower the LIKE threshold to whatever it
comes out at.  The 100 number was really just a guess in the first
place.

            regards, tom lane

Re: Planner problems in 8.2.4 and 8.2.5

От
Gregory Stark
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> It's been clear for quite awhile that a stats target of 10 is often
> too low, but no one has done the legwork to establish what a more
> reasonable tradeoff point would be.

Any ideas on what measurements would be interesting for this?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: Planner problems in 8.2.4 and 8.2.5

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> It's been clear for quite awhile that a stats target of 10 is often
>> too low, but no one has done the legwork to establish what a more
>> reasonable tradeoff point would be.

> Any ideas on what measurements would be interesting for this?

Time to run ANALYZE on large tables, extra planning time for queries of
various complexities; versus whether you actually get a better plan or
not.

In a quick look at selfuncs.c, it appears that the worst planner hit
would be for eqjoin selectivities between non-unique columns (ie, those
having MCV lists).  The number of operations is O(N^2) if there are
N values in each MCV list.

            regards, tom lane