Обсуждение: GIN indexscans versus equality selectivity estimation

Поиск

Список

Период

Сортировка

GIN indexscans versus equality selectivity estimation

От

Tom Lane

Дата:

09 января 2011 г., 19:38:16

Whilst fooling around with GIN over the past few days, I noticed the
following rather surprising behavior:

regression=# create table t1 (f1 int[]);
CREATE TABLE
regression=# insert into t1 values (array[42]);
INSERT 0 1
regression=# create index ti1 on t1 using gin (f1);
CREATE INDEX
regression=# set enable_seqscan to 0;
SET
regression=# explain select * from t1 where f1 = array[42];                             QUERY PLAN
        
 
-----------------------------------------------------------------------Seq Scan on t1
(cost=10000000000.00..10000000001.01rows=1 width=32)  Filter: (f1 = '{42}'::integer[])
 
(2 rows)

The system absolutely will not use the index in this state, even though
the GIN array opclass does support equality.  I dug around and
eventually figured out why:

1. We don't have any pg_stats statistics in this state; but we do know
reltuples = 1, since the CREATE INDEX helpfully updated the pg_class row
with that information.  Without stats, eqsel() falls back to a
selectivity estimate of 1/numdistinct; and without stats,
get_variable_numdistinct estimates the number of distinct values as
equal to reltuples, if that's a small number.  The upshot is that we get
a selectivity estimate of exactly 1.0 for the equality condition.

2. In indxpath.c, we have the following comment and code about selecting
which paths to generate bitmap scan paths for:
    * ... Also, pick out the ones that might be useful    * as bitmap scans.  For that, we must discard indexes that
don'tsupport    * bitmap scans, and we also are only interested in paths that have some    * selectivity; we should
discardanything that was generated solely for    * ordering purposes.
 
       if (ipath->indexinfo->amhasgetbitmap &&           ipath->indexselectivity < 1.0 &&
!ScanDirectionIsBackward(ipath->indexscandir))          bitindexpaths = lappend(bitindexpaths, ipath);
 

Since GIN doesn't support plain indexscans, only bitmap scans, this
means that the planner simply won't use a GIN index unless the estimated
selectivity is less than 1.0.

There are several things we might choose to do about this:

1. Do nothing.  The issue seems quite unlikely to affect anyone in the
field, since in fact "use a seqscan" is probably the right answer
anytime reltuples = 1; and anyway using a GIN index for plain equality
is a corner case to begin with.  However, it could confuse people who
were doing testing (it confused me!).

2. Hack the selectivity estimators so that we don't get exactly 1.0
for this sort of situation.  It might be plausible for
get_variable_numdistinct to set a floor of 2 on its result, for example;
or we could hack eqsel() to bound the no-stats estimate to a bit less
than 1.

3. Change the indxpath.c code to have some different way of determining
whether a path is sensible to use as a bitmap scan.  We could for
instance make the test be "path has some indexquals and/or index is
partial".  Or maybe there should be an exception in the case where the
index type doesn't support plain indexscan.

I'm not really sold on any of these alternatives.  Thoughts?
        regards, tom lane

Re: GIN indexscans versus equality selectivity estimation

От

Josh Berkus

Дата:

09 января 2011 г., 21:06:28

On 1/9/11 3:38 PM, Tom Lane wrote:
> 1. Do nothing.  The issue seems quite unlikely to affect anyone in the
> field, since in fact "use a seqscan" is probably the right answer
> anytime reltuples = 1; and anyway using a GIN index for plain equality
> is a corner case to begin with.  However, it could confuse people who
> were doing testing (it confused me!).

+1.  It's an unexpected result, but not actually a bad one.  It just
doesn't seem worth messing with code which works in production just to
help testing.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: GIN indexscans versus equality selectivity estimation

От

Robert Haas

Дата:

09 января 2011 г., 21:35:48

On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> or we could hack eqsel() to bound the no-stats estimate to a bit less
> than 1.

This seems like a pretty sensible thing to do.  I can't immediately
imagine a situation in which 1.0 is a sensible selectivity estimate in
the no-stats case and 0.90 (say) is a major regression.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: GIN indexscans versus equality selectivity estimation

От

Tom Lane

Дата:

10 января 2011 г., 11:25:20

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> or we could hack eqsel() to bound the no-stats estimate to a bit less
>> than 1.

> This seems like a pretty sensible thing to do.  I can't immediately
> imagine a situation in which 1.0 is a sensible selectivity estimate in
> the no-stats case and 0.90 (say) is a major regression.

After sleeping on it, that seems like my least favorite option.  It's
basically a kluge, as is obvious because there's no principled way to
choose what the bound is (or the minimum result from
get_variable_numdistinct, if we were to hack it there).  I'm currently
leaning to the idea of tweaking the logic in indxpath.c; in particular,
why wouldn't it be a good idea to force consideration of the bitmap path
if the index type hasn't got amgettuple?  If we don't, then we've
completely wasted the effort spent up to that point inside
find_usable_indexes.

Or we could just ignore the issue; as Josh says, that's not an
unreasonable option.  The particular case I ran into is certainly not
too compelling.  I'm a bit worried though that there might be other
cases where the estimator comes up with 1.0 selectivity but it'd still
be worth considering a bitmap scan.
        regards, tom lane

Re: GIN indexscans versus equality selectivity estimation

От

Josh Berkus

Дата:

10 января 2011 г., 14:04:19

On 1/10/11 7:25 AM, Tom Lane wrote:
> I'm a bit worried though that there might be other
> cases where the estimator comes up with 1.0 selectivity but it'd still
> be worth considering a bitmap scan.

Well, I think the answer is to apply the other fixes, and test.  If
there are other cases of selectivity=1.0, they'll show up.  People are
pretty fast to complain if indexes aren't used, and we have a good
production test case available once you implement the other operators.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: GIN indexscans versus equality selectivity estimation

От

Tom Lane

Дата:

10 января 2011 г., 21:09:33

Josh Berkus <josh@agliodbs.com> writes:
> On 1/10/11 7:25 AM, Tom Lane wrote:
>> I'm a bit worried though that there might be other
>> cases where the estimator comes up with 1.0 selectivity but it'd still
>> be worth considering a bitmap scan.

> Well, I think the answer is to apply the other fixes, and test.  If
> there are other cases of selectivity=1.0, they'll show up.  People are
> pretty fast to complain if indexes aren't used, and we have a good
> production test case available once you implement the other operators.

"Implement the other operators"?  I don't think we're on the same page
here.  What I'm talking about is a one-line change in indxpath.c to not
short-circuit consideration of a bitmap indexscan.
        regards, tom lane

Re: GIN indexscans versus equality selectivity estimation

От

Robert Haas

Дата:

10 января 2011 г., 21:18:55

On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> or we could hack eqsel() to bound the no-stats estimate to a bit less
>>> than 1.
>
>> This seems like a pretty sensible thing to do.  I can't immediately
>> imagine a situation in which 1.0 is a sensible selectivity estimate in
>> the no-stats case and 0.90 (say) is a major regression.
>
> After sleeping on it, that seems like my least favorite option.  It's
> basically a kluge, as is obvious because there's no principled way to
> choose what the bound is (or the minimum result from
> get_variable_numdistinct, if we were to hack it there).

Well, the general problem is that we have no reasonable way of
handling planning uncertainty.  We have no way of throwing our hands
up in the air and saying "I really have no clue how many rows are
going to come out of that node"; as far as the rest of the planning
process is concerned, a selectivity estimate of 0.005 based on
<column> = <some MCV with a frequency of 0.005> is exactly identical
to one that results from a completely inscrutable equality condition.
So while I agree with you that there's no particular principled way to
choose the exact value, that doesn't strike me as a compelling
argument against fixing some value.  ISTM that selectivity estimates
of exactly 0 and exactly 1 ought to be viewed with a healthy dose of
suspicion.

> I'm currently
> leaning to the idea of tweaking the logic in indxpath.c; in particular,
> why wouldn't it be a good idea to force consideration of the bitmap path
> if the index type hasn't got amgettuple?  If we don't, then we've
> completely wasted the effort spent up to that point inside
> find_usable_indexes.

I guess the obvious question is: why wouldn't it be a good idea to
force consideration of the bitmap path even if the index type DOES
have amgettuple?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: GIN indexscans versus equality selectivity estimation

От

Tom Lane

Дата:

10 января 2011 г., 21:34:43

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 10, 2011 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm currently
>> leaning to the idea of tweaking the logic in indxpath.c; in particular,
>> why wouldn't it be a good idea to force consideration of the bitmap path
>> if the index type hasn't got amgettuple? �If we don't, then we've
>> completely wasted the effort spent up to that point inside
>> find_usable_indexes.

> I guess the obvious question is: why wouldn't it be a good idea to
> force consideration of the bitmap path even if the index type DOES
> have amgettuple?

Well, the motivation is what the code comment said: not to waste time
uselessly considering the bitmap form of an indexscan whose only reason
to live was to produce output sorted in a particular way.  That's
irrelevant for GIN of course, but it's entirely relevant for btree.

It might be just useless over-optimization, but I don't think so --
choose_bitmap_and is O(N^2) in the number of paths submitted to it,
so adding a lot of uninteresting paths doesn't seem smart.

A small variant of the approach would be to only reject paths that have
non-empty pathkeys.  That's not a *sufficient* condition, because a path
could have both pathkeys and good selectivity --- but it could be added
onto the selectivity test.
        regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: GIN indexscans versus equality selectivity estimation