Обсуждение: Failed assertion clauses != NIL

Поиск
Список
Период
Сортировка

Failed assertion clauses != NIL

От
Manuel Rigger
Дата:
Hi everyone,

when building PostgreSQL with -enable-cassert, executing the following
statements result in an assertion error:

CREATE TABLE t0(c0 boolean, c1 boolean, c2 boolean);
INSERT INTO t0 VALUES(FALSE, FALSE, FALSE);
CREATE STATISTICS s0 ON c0, c2 FROM t0;
ANALYZE;
SELECT * FROM t0 WHERE t0.c2 OR t0.c1 OR t0.c0;

I can reproduce this on the latest trunk version (7f33836).

Best,
Manuel

Stacktrace:

TRAP: FailedAssertion("clauses != NIL", File: "mcv.c", Line: 1551)
postgres: postgres db [local] SELECT(ExceptionalCondition+0x76)[0x5563ab44e4a6]
postgres: postgres db [local] SELECT(+0x3d6e91)[0x5563ab2e1e91]
postgres: postgres db [local]
SELECT(mcv_clauselist_selectivity+0x41)[0x5563ab2e47b1]
postgres: postgres db [local]
SELECT(statext_clauselist_selectivity+0x362)[0x5563ab2e1692]
postgres: postgres db [local]
SELECT(clauselist_selectivity+0x174)[0x5563ab227134]
postgres: postgres db [local]
SELECT(set_baserel_size_estimates+0x38)[0x5563ab22dc38]
postgres: postgres db [local] SELECT(+0x319198)[0x5563ab224198]
postgres: postgres db [local] SELECT(make_one_rel+0x100)[0x5563ab226150]
postgres: postgres db [local] SELECT(query_planner+0x15b)[0x5563ab24b3fb]
postgres: postgres db [local] SELECT(+0x34531c)[0x5563ab25031c]
postgres: postgres db [local] SELECT(subquery_planner+0xd39)[0x5563ab253389]
postgres: postgres db [local] SELECT(standard_planner+0x117)[0x5563ab254567]
postgres: postgres db [local] SELECT(pg_plan_query+0x39)[0x5563ab320469]
postgres: postgres db [local] SELECT(pg_plan_queries+0x46)[0x5563ab320556]
postgres: postgres db [local] SELECT(+0x4158cf)[0x5563ab3208cf]
postgres: postgres db [local] SELECT(PostgresMain+0x14ef)[0x5563ab32228f]
postgres: postgres db [local] SELECT(+0x38d324)[0x5563ab298324]
postgres: postgres db [local] SELECT(PostmasterMain+0xe01)[0x5563ab299241]
postgres: postgres db [local] SELECT(main+0x4b8)[0x5563aafc1048]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f128537eb6b]
postgres: postgres db [local] SELECT(_start+0x2a)[0x5563aafc10ea]
2019-11-19 13:35:55.103 CET [22654] LOG:  server process (PID 23121)
was terminated by signal 6: Aborted
2019-11-19 13:35:55.103 CET [22654] DETAIL:  Failed process was
running: SELECT * FROM t0 WHERE t0.c2 OR t0.c1 OR t0.c0;



Re: Failed assertion clauses != NIL

От
Dmitry Dolgov
Дата:
> On Tue, Nov 19, 2019 at 01:50:51PM +0100, Manuel Rigger wrote:
>
> when building PostgreSQL with -enable-cassert, executing the following
> statements result in an assertion error:
>
> CREATE TABLE t0(c0 boolean, c1 boolean, c2 boolean);
> INSERT INTO t0 VALUES(FALSE, FALSE, FALSE);
> CREATE STATISTICS s0 ON c0, c2 FROM t0;
> ANALYZE;
> SELECT * FROM t0 WHERE t0.c2 OR t0.c1 OR t0.c0;

Yes, I can reproduce it too. mcv_get_match_bitmap expects that
stat_clauses will not be empty, but looks like in this situation
stat_clauses is indeed NIL. clauselist_selectivity_simple right before
actually doesn't insist on stat_clauses being non empty, probably it's
just too strict assert.



Re: Failed assertion clauses != NIL

От
Daniel Gustafsson
Дата:
> On 19 Nov 2019, at 14:38, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
>> On Tue, Nov 19, 2019 at 01:50:51PM +0100, Manuel Rigger wrote:
>>
>> when building PostgreSQL with -enable-cassert, executing the following
>> statements result in an assertion error:
>>
>> CREATE TABLE t0(c0 boolean, c1 boolean, c2 boolean);
>> INSERT INTO t0 VALUES(FALSE, FALSE, FALSE);
>> CREATE STATISTICS s0 ON c0, c2 FROM t0;
>> ANALYZE;
>> SELECT * FROM t0 WHERE t0.c2 OR t0.c1 OR t0.c0;
>
> Yes, I can reproduce it too. mcv_get_match_bitmap expects that
> stat_clauses will not be empty, but looks like in this situation
> stat_clauses is indeed NIL. clauselist_selectivity_simple right before
> actually doesn't insist on stat_clauses being non empty, probably it's
> just too strict assert.

I might be missing something, but if the clause list is NIL, wouldn't it better
to exit earlier from statext_mcv_clauselist_selectivity rather than relax the
Assertion since we will get a 1.0 estimate either way?

cheers ./daniel

--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1267,6 +1267,9 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
                listidx++;
        }

+       if (stat_clauses == NIL)
+               return 1.0;
+
        /*
         * First compute "simple" selectivity, i.e. without the extended
         * statistics, and essentially assuming independence of the




Re: Failed assertion clauses != NIL

От
Dmitry Dolgov
Дата:
> On Tue, Nov 19, 2019 at 02:45:54PM +0100, Daniel Gustafsson wrote:
> > On 19 Nov 2019, at 14:38, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> >
> >> On Tue, Nov 19, 2019 at 01:50:51PM +0100, Manuel Rigger wrote:
> >>
> >> when building PostgreSQL with -enable-cassert, executing the following
> >> statements result in an assertion error:
> >>
> >> CREATE TABLE t0(c0 boolean, c1 boolean, c2 boolean);
> >> INSERT INTO t0 VALUES(FALSE, FALSE, FALSE);
> >> CREATE STATISTICS s0 ON c0, c2 FROM t0;
> >> ANALYZE;
> >> SELECT * FROM t0 WHERE t0.c2 OR t0.c1 OR t0.c0;
> >
> > Yes, I can reproduce it too. mcv_get_match_bitmap expects that
> > stat_clauses will not be empty, but looks like in this situation
> > stat_clauses is indeed NIL. clauselist_selectivity_simple right before
> > actually doesn't insist on stat_clauses being non empty, probably it's
> > just too strict assert.
>
> I might be missing something, but if the clause list is NIL, wouldn't it better
> to exit earlier from statext_mcv_clauselist_selectivity rather than relax the
> Assertion since we will get a 1.0 estimate either way?
>
> cheers ./daniel
>
> --- a/src/backend/statistics/extended_stats.c
> +++ b/src/backend/statistics/extended_stats.c
> @@ -1267,6 +1267,9 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
>                 listidx++;
>         }
>
> +       if (stat_clauses == NIL)
> +               return 1.0;
> +
>         /*
>          * First compute "simple" selectivity, i.e. without the extended
>          * statistics, and essentially assuming independence of the
>

Yep, seems like a reasonable thing to do, especially since it's already
like that in a few other places in this function.



Re: Failed assertion clauses != NIL

От
Tomas Vondra
Дата:
On Tue, Nov 19, 2019 at 02:45:54PM +0100, Daniel Gustafsson wrote:
>> On 19 Nov 2019, at 14:38, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>>
>>> On Tue, Nov 19, 2019 at 01:50:51PM +0100, Manuel Rigger wrote:
>>>
>>> when building PostgreSQL with -enable-cassert, executing the following
>>> statements result in an assertion error:
>>>
>>> CREATE TABLE t0(c0 boolean, c1 boolean, c2 boolean);
>>> INSERT INTO t0 VALUES(FALSE, FALSE, FALSE);
>>> CREATE STATISTICS s0 ON c0, c2 FROM t0;
>>> ANALYZE;
>>> SELECT * FROM t0 WHERE t0.c2 OR t0.c1 OR t0.c0;
>>
>> Yes, I can reproduce it too. mcv_get_match_bitmap expects that
>> stat_clauses will not be empty, but looks like in this situation
>> stat_clauses is indeed NIL. clauselist_selectivity_simple right before
>> actually doesn't insist on stat_clauses being non empty, probably it's
>> just too strict assert.
>
>I might be missing something, but if the clause list is NIL, wouldn't it better
>to exit earlier from statext_mcv_clauselist_selectivity rather than relax the
>Assertion since we will get a 1.0 estimate either way?
>

Hmmm, this is actually a thinko in how we match stats to clauses.
We simply extract attnums from Vars in each clause, and then pick the
statistic matching at least two of those attnums (and we pick the one
matching the most attnums, but that does not matter here).

And then we go and pick all the clauses covered by the statistic,
assuming that we'll get some matching clauses. Unfortunately, that fails
here because it's essentially just a single OR clause. And it references
attributes that are not covered by the statistic.

So we get clauses_attnums = {1,2,3}, we pick the statistic which however
only covers {1,3}, and then we fail because the clause is

   c0 OR c1 OD c2

which is not actually covered by the statistic, because of c1. Kaboom!

Yes, adding the condition to statext_mcv_clauselist_selectivity() would
make this go away, and it's about the  simplest solution.

Ideally, we'd be able to improve the statistics matching to recognize
it has to match all three attributes to match the clause, which in this
case would mean the OR clause is passed to clause_selectivity, and we do
some magic with extended statistics there.

I'll see how complex / backpatchable that would be.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Failed assertion clauses != NIL

От
Dean Rasheed
Дата:
On Tue, 19 Nov 2019 at 15:08, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> Yes, adding the condition to statext_mcv_clauselist_selectivity() would
> make this go away, and it's about the  simplest solution.
>

It's probably worth going a little further, and verifying that
stat_clauses references at least two attributes. We do that further up
for the original clause list, but it may not be true for the filtered
list. For example, given a WHERE clause like

  c0 > 0 AND c0 < 10 AND (c0 = 0 OR c1 = 1 OR c2 = 2)

and stats on (c0, c1), stat_clauses would include the first 2 clauses,
but they only reference 1 column, so it would be preferable to not use
the multivariate stats in that case.


> Ideally, we'd be able to improve the statistics matching to recognize
> it has to match all three attributes to match the clause, which in this
> case would mean the OR clause is passed to clause_selectivity, and we do
> some magic with extended statistics there.
>
> I'll see how complex / backpatchable that would be.
>

Yes, that seems like a worthwhile thing to do, but I think it goes
beyond what would normally be back-patched. It would really be a
feature enhancement rather than a bug fix.

Regards,
Dean



Re: Failed assertion clauses != NIL

От
Tomas Vondra
Дата:
On Thu, Nov 21, 2019 at 12:29:39PM +0000, Dean Rasheed wrote:
>On Tue, 19 Nov 2019 at 15:08, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> Yes, adding the condition to statext_mcv_clauselist_selectivity() would
>> make this go away, and it's about the  simplest solution.
>>
>
>It's probably worth going a little further, and verifying that
>stat_clauses references at least two attributes. We do that further up
>for the original clause list, but it may not be true for the filtered
>list. For example, given a WHERE clause like
>
>  c0 > 0 AND c0 < 10 AND (c0 = 0 OR c1 = 1 OR c2 = 2)
>
>and stats on (c0, c1), stat_clauses would include the first 2 clauses,
>but they only reference 1 column, so it would be preferable to not use
>the multivariate stats in that case.
>

Yeah, good point.

Not sure what to do about this:

   (c0 > 0 OR c2 < 10) AND (c0 = 0 OR c1 = 1 OR c2 = 2)

In that case we match just the first OR clause, but it references two
attributes. I guess we should ignore that too and handle that later just
like the regular OR clauses.

>
>> Ideally, we'd be able to improve the statistics matching to recognize
>> it has to match all three attributes to match the clause, which in this
>> case would mean the OR clause is passed to clause_selectivity, and we do
>> some magic with extended statistics there.
>>
>> I'll see how complex / backpatchable that would be.
>>
>
>Yes, that seems like a worthwhile thing to do, but I think it goes
>beyond what would normally be back-patched. It would really be a
>feature enhancement rather than a bug fix.
>

True. I don't have a patch yet, but it's likely to be a bit more
invasive than I'd like to backpatch.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Failed assertion clauses != NIL

От
Tomas Vondra
Дата:
Hi,

Attached are two patched, related to this bug report.


0001 - Fix choose_best_statistics to check clauses individually
---------------------------------------------------------------

This modifies the choose_best_statistics function to properly check
which clauses are actually covered by each statistic object, and only
use attnums from those.

The patch ended up pretty small, because we already have all the
necessary info (per-clause attnums) precalculated. Which means this
should not be much more expensive than before.

The main drawback is that this does change signature of a function
defined in statistics.h - we have to pass more info (per-clause bitmaps
and info which clauses are already estimated). Which means ABI break.

I'm not sure how likely it is that external code is calling this
function, but the probability is non-zero. So maybe even if the patch is
fairly small, in backbranches we should use the simple fix with just
returning if the list is NIL.


0002 - WIP: Use extended statistics to estimate OR clauses
----------------------------------------------------------

No matter what 0001 does, it's clear the current code fails to handle OR
clauses that are not fully covered by an extended statistic. For AND
clauses that's not an issue - we simply estimate the covered ones, and
then add the remaining ones by assuming independence.

But clauselist_selectivity sees OR clauses as a single single clause,
and clause_selectivity() simply used the (s1+s2-s1*s2) formula without
considering extended statistics for is_orclause. (For is_andclause we
call clauselist_selectivity recursively, which does consider extended
stats, of course.)

This commit addresses this by calling a clauselist_selectivity variant
for clauses connected by OR, and calling it from the is_orclause branch.
It then requires a bunch of changes elsewhere, to propagate the is_or
flag properly etc.

This is clearly not a thing we could/want to backpatch, and at this
point it's not anywhere close to committable. It's more a WIP patch
highlighting places that will need tweaking to make this work.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Failed assertion clauses != NIL

От
Dean Rasheed
Дата:
On Sun, 24 Nov 2019 at 23:40, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> Hi,
>
> Attached are two patched, related to this bug report.
>
>
> 0001 - Fix choose_best_statistics to check clauses individually
> ---------------------------------------------------------------
>
> This modifies the choose_best_statistics function to properly check
> which clauses are actually covered by each statistic object, and only
> use attnums from those.
>
> The patch ended up pretty small, because we already have all the
> necessary info (per-clause attnums) precalculated. Which means this
> should not be much more expensive than before.
>
> The main drawback is that this does change signature of a function
> defined in statistics.h - we have to pass more info (per-clause bitmaps
> and info which clauses are already estimated). Which means ABI break.
>
> I'm not sure how likely it is that external code is calling this
> function, but the probability is non-zero. So maybe even if the patch is
> fairly small, in backbranches we should use the simple fix with just
> returning if the list is NIL.
>

On a quick read-through that algorithm makes a lot more sense. It
seems pretty unlikely that anyone would be using
choose_best_statistics() anywhere else, so I think maybe it's fine to
change that in back-branches.

A couple of comments:

1). I think you should pass estimatedClauses to
choose_best_statistics() as a pure input parameter (i.e., remove one
"*"), since it doesn't (and must not) modify that set. In fact, on
closer inspection, I don't think you need to pass it to
choose_best_statistics() at all, since its callers already check
clauses against estimatedClauses. Therefore, in
choose_best_statistics(), incompatible and already-estimated clauses
both appear the same (as NULL/empty attribute sets), and therefore the
estimatedClauses check will never be tripped.

2). The new parameter "clauses" should probably be called something
like "clause_attnums" or some such, to better reflect what it actually
is. And it should be documented that NULL values represent
incompatible/already-estimated clauses.

3). In statext_mcv_clauselist_selectivity(), the check for at least 2
attributes is no longer needed, because choose_best_statistics() now
does that, so there's also no need to compute clauses_attnums there.

Regards,
Dean



Re: Failed assertion clauses != NIL

От
Tomas Vondra
Дата:
On Tue, Nov 26, 2019 at 10:56:33AM +0000, Dean Rasheed wrote:
>On Sun, 24 Nov 2019 at 23:40, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> Hi,
>>
>> Attached are two patched, related to this bug report.
>>
>>
>> 0001 - Fix choose_best_statistics to check clauses individually
>> ---------------------------------------------------------------
>>
>> This modifies the choose_best_statistics function to properly check
>> which clauses are actually covered by each statistic object, and only
>> use attnums from those.
>>
>> The patch ended up pretty small, because we already have all the
>> necessary info (per-clause attnums) precalculated. Which means this
>> should not be much more expensive than before.
>>
>> The main drawback is that this does change signature of a function
>> defined in statistics.h - we have to pass more info (per-clause bitmaps
>> and info which clauses are already estimated). Which means ABI break.
>>
>> I'm not sure how likely it is that external code is calling this
>> function, but the probability is non-zero. So maybe even if the patch is
>> fairly small, in backbranches we should use the simple fix with just
>> returning if the list is NIL.
>>
>
>On a quick read-through that algorithm makes a lot more sense. It
>seems pretty unlikely that anyone would be using
>choose_best_statistics() anywhere else, so I think maybe it's fine to
>change that in back-branches.
>
>A couple of comments:
>
>1). I think you should pass estimatedClauses to
>choose_best_statistics() as a pure input parameter (i.e., remove one
>"*"), since it doesn't (and must not) modify that set.

Yeah, good point.

>In fact, on closer inspection, I don't think you need to pass it to
>choose_best_statistics() at all, since its callers already check
>clauses against estimatedClauses. Therefore, in
>choose_best_statistics(), incompatible and already-estimated clauses
>both appear the same (as NULL/empty attribute sets), and therefore the
>estimatedClauses check will never be tripped.
>

Right, but I'm thinking about the patch that allows applying multiple
statistics. With that applied, this changes to a while loop - and we'll
either have to rebuild the list_attnums or pass the bitmap.

>2). The new parameter "clauses" should probably be called something
>like "clause_attnums" or some such, to better reflect what it actually
>is. And it should be documented that NULL values represent
>incompatible/already-estimated clauses.
>

Yes, agreed.

>3). In statext_mcv_clauselist_selectivity(), the check for at least 2
>attributes is no longer needed, because choose_best_statistics() now
>does that, so there's also no need to compute clauses_attnums there.
>

Good point. I'll replace that with an assert.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Failed assertion clauses != NIL

От
Tomas Vondra
Дата:
On Tue, Nov 26, 2019 at 06:41:31PM +0100, Tomas Vondra wrote:
>On Tue, Nov 26, 2019 at 10:56:33AM +0000, Dean Rasheed wrote:
>>On Sun, 24 Nov 2019 at 23:40, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>>
>>>Hi,
>>>
>>>Attached are two patched, related to this bug report.
>>>
>>>
>>>0001 - Fix choose_best_statistics to check clauses individually
>>>---------------------------------------------------------------
>>>
>>>This modifies the choose_best_statistics function to properly check
>>>which clauses are actually covered by each statistic object, and only
>>>use attnums from those.
>>>
>>>The patch ended up pretty small, because we already have all the
>>>necessary info (per-clause attnums) precalculated. Which means this
>>>should not be much more expensive than before.
>>>
>>>The main drawback is that this does change signature of a function
>>>defined in statistics.h - we have to pass more info (per-clause bitmaps
>>>and info which clauses are already estimated). Which means ABI break.
>>>
>>>I'm not sure how likely it is that external code is calling this
>>>function, but the probability is non-zero. So maybe even if the patch is
>>>fairly small, in backbranches we should use the simple fix with just
>>>returning if the list is NIL.
>>>
>>
>>On a quick read-through that algorithm makes a lot more sense. It
>>seems pretty unlikely that anyone would be using
>>choose_best_statistics() anywhere else, so I think maybe it's fine to
>>change that in back-branches.
>>
>>A couple of comments:
>>
>>1). I think you should pass estimatedClauses to
>>choose_best_statistics() as a pure input parameter (i.e., remove one
>>"*"), since it doesn't (and must not) modify that set.
>
>Yeah, good point.
>
>>In fact, on closer inspection, I don't think you need to pass it to
>>choose_best_statistics() at all, since its callers already check
>>clauses against estimatedClauses. Therefore, in
>>choose_best_statistics(), incompatible and already-estimated clauses
>>both appear the same (as NULL/empty attribute sets), and therefore the
>>estimatedClauses check will never be tripped.
>>
>
>Right, but I'm thinking about the patch that allows applying multiple
>statistics. With that applied, this changes to a while loop - and we'll
>either have to rebuild the list_attnums or pass the bitmap.
>

On second thought, that's not quite relevant in backbranches, so I've
removed the parameters for now. I'll add it in the patch that adds
support for multiple stats.

Attached is the 0001 part, addressing (hopefully) all the comments.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Failed assertion clauses != NIL

От
Dean Rasheed
Дата:
On Wed, 27 Nov 2019 at 00:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> >On Tue, Nov 26, 2019 at 10:56:33AM +0000, Dean Rasheed wrote:
> >
> >>In fact, on closer inspection, I don't think you need to pass it to
> >>choose_best_statistics() at all, since its callers already check
> >>clauses against estimatedClauses. Therefore, in
> >>choose_best_statistics(), incompatible and already-estimated clauses
> >>both appear the same (as NULL/empty attribute sets), and therefore the
> >>estimatedClauses check will never be tripped.
> >
> >Right, but I'm thinking about the patch that allows applying multiple
> >statistics. With that applied, this changes to a while loop - and we'll
> >either have to rebuild the list_attnums or pass the bitmap.
>
> On second thought, that's not quite relevant in backbranches, so I've
> removed the parameters for now. I'll add it in the patch that adds
> support for multiple stats.
>

Or alternatively, that patch could just NULL-out the relevant
list_attnums[] array entry once the corresponding clause has been
estimated, which would avoid needing to change
choose_best_statistics() again.

> Attached is the 0001 part, addressing (hopefully) all the comments.
>

I just spotted a trivial comment typo in dependencies_clauselist_selectivity():

+ /*
+ * We expect the bitmaps ton contain a single attribute number.
+ */
+ attnum = bms_singleton_member(list_attnums[listidx]);

s/ton/to/

Also, in statext_mcv_clauselist_selectivity(), clauses_attnums is now
unused, so there's no point in computing it (unless you wanted to add
the Assert() you talked about, but I don't think it's really
necessary).

Otherwise it looks good to me.

Regards,
Dean



Re: Failed assertion clauses != NIL

От
Tomas Vondra
Дата:
On Wed, Nov 27, 2019 at 09:26:11AM +0000, Dean Rasheed wrote:
>On Wed, 27 Nov 2019 at 00:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> >On Tue, Nov 26, 2019 at 10:56:33AM +0000, Dean Rasheed wrote:
>> >
>> >>In fact, on closer inspection, I don't think you need to pass it to
>> >>choose_best_statistics() at all, since its callers already check
>> >>clauses against estimatedClauses. Therefore, in
>> >>choose_best_statistics(), incompatible and already-estimated clauses
>> >>both appear the same (as NULL/empty attribute sets), and therefore the
>> >>estimatedClauses check will never be tripped.
>> >
>> >Right, but I'm thinking about the patch that allows applying multiple
>> >statistics. With that applied, this changes to a while loop - and we'll
>> >either have to rebuild the list_attnums or pass the bitmap.
>>
>> On second thought, that's not quite relevant in backbranches, so I've
>> removed the parameters for now. I'll add it in the patch that adds
>> support for multiple stats.
>>
>
>Or alternatively, that patch could just NULL-out the relevant
>list_attnums[] array entry once the corresponding clause has been
>estimated, which would avoid needing to change
>choose_best_statistics() again.
>
>> Attached is the 0001 part, addressing (hopefully) all the comments.
>>
>
>I just spotted a trivial comment typo in dependencies_clauselist_selectivity():
>
>+ /*
>+ * We expect the bitmaps ton contain a single attribute number.
>+ */
>+ attnum = bms_singleton_member(list_attnums[listidx]);
>
>s/ton/to/
>
>Also, in statext_mcv_clauselist_selectivity(), clauses_attnums is now
>unused, so there's no point in computing it (unless you wanted to add
>the Assert() you talked about, but I don't think it's really
>necessary).
>
>Otherwise it looks good to me.
>

I've pushed this, after adding a regression test for OR clauses that are
not fully covered by the MCV statistic. The existing regression queries
test almost exclusively AND clauses, so I've considered adding a couple
more, but then I reliazed the support for OR clauses is somewhat limited
so the patch improving support for OR clauses is a better opportunity.

And now that I look at Dean's message again, I realize I forgot to
remove the unnecessary clauses_attnum variable, so I'll fix that.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services