Обсуждение: PATCH: index-only scans with partial indexes

Поиск

Список

Период

Сортировка

PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

10 июля 2015 г., 21:29:54

Hi,

currently partial indexes end up not using index only scans in most
cases, because check_index_only() is overly conservative, as explained
in this comment:

  * XXX this is overly conservative for partial indexes, since we will
  * consider attributes involved in the index predicate as required even
  * though the predicate won't need to be checked at runtime. (The same
  * is true for attributes used only in index quals, if we are certain
  * that the index is not lossy.)  However, it would be quite expensive
  * to determine that accurately at this point, so for now we take the
  * easy way out.

In other words, unless you include columns from the index predicate to
the index, the planner will decide index only scans are not possible.
Which is a bit unfortunate, because those columns are not needed at
runtime, and will only increase the index size (and the main benefit of
partial indexes is size reduction).

The attached patch fixes this by only considering clauses that are not
implied by the index predicate. The effect is simple:

     create table t as select i as a, i as b from
                       generate_series(1,10000000) s(i);

     create index tidx_partial on t(b) where a > 1000 and a < 2000;

     vacuum freeze t;
     analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

                                 QUERY PLAN
-----------------------------------------------------------------------
  Aggregate  (cost=39.44..39.45 rows=1 width=4)
             (actual time=8.350..8.354 rows=1 loops=1)
    ->  Index Scan using tidx_partial on t
             (cost=0.28..37.98 rows=585 width=4)
             (actual time=0.034..4.368 rows=999 loops=1)
  Planning time: 0.197 ms
  Execution time: 8.441 ms
(4 rows)

explain analyze select count(b) from t where a > 1000 and a < 2000;

                                 QUERY PLAN
-----------------------------------------------------------------------
  Aggregate  (cost=33.44..33.45 rows=1 width=4)
             (actual time=8.019..8.023 rows=1 loops=1)
    ->  Index Only Scan using tidx_partial on t
             (cost=0.28..31.98 rows=585 width=4)
             (actual time=0.036..4.165 rows=999 loops=1)
          Heap Fetches: 0
  Planning time: 0.188 ms
  Execution time: 8.106 ms
(5 rows)


I've done a bunch of tests, and I do see small (hardly noticeable)
increase in planning time with long list of WHERE clauses, because all
those need to be checked against the index predicate. Not sure if this
is what's meant by 'quite expensive' in the comment. Moreover, this was
more than compensated by the IOS benefits (even with everything in RAM).

But maybe it's possible to fix that somehow? For example, we're
certainly doing those checks elsewhere when deciding which clauses need
to be evaluated at run-time, so maybe we could cache that somehow?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

partial-index-only-scan-v1.patch

Re: PATCH: index-only scans with partial indexes

От

Tom Lane

Дата:

10 июля 2015 г., 23:43:37

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> currently partial indexes end up not using index only scans in most 
> cases, because check_index_only() is overly conservative, as explained 
> in this comment:
> ...

> I've done a bunch of tests, and I do see small (hardly noticeable) 
> increase in planning time with long list of WHERE clauses, because all 
> those need to be checked against the index predicate. Not sure if this 
> is what's meant by 'quite expensive' in the comment. Moreover, this was 
> more than compensated by the IOS benefits (even with everything in RAM).

> But maybe it's possible to fix that somehow? For example, we're 
> certainly doing those checks elsewhere when deciding which clauses need 
> to be evaluated at run-time, so maybe we could cache that somehow?

The key problem here is that you're doing those proofs vastly earlier than
before, for indexes that might not get used at all in the final plan.
If you do some tests with multiple partial indexes you will probably see
a bigger planning-time penalty.

Perhaps we should bite the bullet and do it anyway, but I'm pretty
suspicious of any claim that the planning cost is minimal.
        regards, tom lane

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

11 июля 2015 г., 15:01:09

Hi,

On 07/10/2015 10:43 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> currently partial indexes end up not using index only scans in most
>> cases, because check_index_only() is overly conservative, as explained
>> in this comment:
>> ...
>
>> I've done a bunch of tests, and I do see small (hardly noticeable)
>> increase in planning time with long list of WHERE clauses, because all
>> those need to be checked against the index predicate. Not sure if this
>> is what's meant by 'quite expensive' in the comment. Moreover, this was
>> more than compensated by the IOS benefits (even with everything in RAM).
>
>> But maybe it's possible to fix that somehow? For example, we're
>> certainly doing those checks elsewhere when deciding which clauses need
>> to be evaluated at run-time, so maybe we could cache that somehow?
>
> The key problem here is that you're doing those proofs vastly earlier
> than before, for indexes that might not get used at all in the final
> plan. If you do some tests with multiple partial indexes you will
> probably see a bigger planning-time penalty.

Hmmm. Maybe we could get a bit smarter by looking at the attnums of each 
clause before doing the expensive stuff (which is predicate_implied_by I 
believe), exploiting a few simple observations:
  * if the clause is already covered by attrs_used, we don't need to    process it at all
  * if the clause uses attributes not included in the index predicate,    we know it can't be implied

Of course, those are local optimizations, and can't fix some of the 
problems (e.g. a lot of partial indexes).

> Perhaps we should bite the bullet and do it anyway, but I'm pretty
> suspicious of any claim that the planning cost is minimal.

Perhaps - I'm not claiming the planning cost is minimal. It was in the 
tests I've done, but no doubt it's possible to construct examples where 
the planning time will get much worse. With 30 partial indexes, I got an 
increase from 0.01 ms to ~2.5ms on simple queries.

But maybe we could get at least some of the benefits by planning the 
index scans like today, and then do the IOS check later? Of course, this 
won't help with cases where the index scan is thrown away while the 
index only scan would win, but it does help with cases where we end up 
doing index scan anyway?

That's essentially what I'm struggling right now - I do have a 3TB data 
set, the plan looks like this:
                               QUERY PLAN
------------------------------------------------------------------------ Sort  (cost=1003860164.92..1003860164.92
rows=1width=16)   Sort Key: orders.o_orderpriority   ->  HashAggregate         Group Key: orders.o_orderpriority
->  Merge Semi Join               Merge Cond:               ->  Index Scan using pk_orders on orders
Filter: ((o_orderdate >= '1997-07-01'::date) AND                     (o_orderdate < '1997-10-01 00:00:00'::timestamp))
            ->  Index Scan using lineitem_l_orderkey_idx_part1 on                   lineitem

and the visibility checks from Index Scans are killing the I/O. An IOS 
is likely to perform much better here (but haven't ran the query yet).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Jeff Janes

Дата:

25 августа 2015 г., 20:19:36

On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment:

* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.

In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction).

The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
generate_series(1,10000000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

However, "explain analyze select sum(b) from t where a > 1000 and a < 1999;" still doesn't use the index only

scan. Isn't that also implied by the predicate?

Cheers,

Jeff

Re: PATCH: index-only scans with partial indexes

От

Anastasia Lubennikova

Дата:

04 сентября 2015 г., 14:28:40

25.08.2015 20:19, Jeff Janes пишет:

On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment:

* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.

In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction).

The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
generate_series(1,10000000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

However, "explain analyze select sum(b) from t where a > 1000 and a < 1999;" still doesn't use the index only
scan. Isn't that also implied by the predicate?

In this example it doesn't use IndexOnlyScan correctly. If I understand partial indexes right, if index predicate and search clause are not equal, index scan must recheck values when it's fetching them.
'tidx_partial' in example above has no information about 'a' attribute, beside the index->indpred, so it is impossible to recheck qual without referencing to table.

In example:
create index tidx_partial on t(a) where a > 1000 and a < 2000;
explain analyze select sum(a) from t where a > 1000 and a < 1999;
it can use IndexOnlyScan.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PATCH: index-only scans with partial indexes

От

Jeff Janes

Дата:

04 сентября 2015 г., 19:10:57

On Fri, Sep 4, 2015 at 4:28 AM, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:

25.08.2015 20:19, Jeff Janes пишет:
On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment:

* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.

In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction).

The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
generate_series(1,10000000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

However, "explain analyze select sum(b) from t where a > 1000 and a < 1999;" still doesn't use the index only
scan. Isn't that also implied by the predicate?

In this example it doesn't use IndexOnlyScan correctly. If I understand partial indexes right, if index predicate and search clause are not equal, index scan must recheck values when it's fetching them.
'tidx_partial' in example above has no information about 'a' attribute, beside the index->indpred, so it is impossible to recheck qual without referencing to table.

In example:
create index tidx_partial on t(a) where a > 1000 and a < 2000;
explain analyze select sum(a) from t where a > 1000 and a < 1999;
it can use IndexOnlyScan.

Yes, of course. Thanks for the explanation, it is obvious now that you have explained it. I kept slipping into thinking that the predicate-dependent variables are included in the index but only when the predicate is met, but that isn't the case.

How can we evaluate Tom's performance concerns? I tried turning log_planner_stats on and using the regression test as a load generator, but I don't think that that is very demanding of a test.

Thanks,

Jeff

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

05 сентября 2015 г., 00:03:45

Hi,

On 09/04/2015 06:10 PM, Jeff Janes wrote:
>
> How can we evaluate Tom's performance concerns?  I tried
> turning log_planner_stats on and using the regression test as a load
> generator, but I don't think that that is very demanding of a test.

I've done a bit of benchmarking today, trying to measure how expensive
the additional checks are.

Using a simple table with just 4 columns and 1M rows

     CREATE TABLE t AS SELECT i AS a, i AS b, i AS c, i AS d
                         FROM generate_series(1,1000000) s(i);

with three different index sets:

   - no indexes
   - 40 regular indexes (indexes-1.sql)
   - 40 partial indexes (indexes-2.sql)

and two different query sets:

   - matching the partial indexes (queries-1.sql)
   - not matching the partial indexes (queries-2.sql)

which means 6 combinations:

    A: no indexes / queries-1
    B: no indexes / queries-2
    C: indexes-1 / queries-1
    D: indexes-1 / queries-2
    E: indexes-2 / queries-1
    F: indexes-2 / queries-2

A summary of 100 EXPLAIN timings looks like this:

master       A          B          C          D          E          F
-------------------------------------------------------------------------
min        0.10       0.10       0.30       0.29       0.66       0.23
max        1.07       1.00       2.13       1.98       4.52       1.59
median     0.49       0.52       0.31       0.33       0.68       1.12
average    0.43       0.35       0.62       0.49       1.01       0.89

patched     A          B          C          D          E          F
-------------------------------------------------------------------------
min        0.11       0.11       0.29       0.29       0.70       0.22
max        0.99       1.05       0.55       1.93       3.79       1.12
median     0.19       0.55       0.32       0.34       0.74       0.24
average    0.42       0.52       0.34       0.55       0.95       0.27

A-D should be exactly the same, because there are no partial indexes,
and the results match that expectation.

E and F should be different, depending on how expensive the additional
checks are. But in this benchmark that's not true - the patched version
is actually a bit faster, thanks to noise.

I find that a bit strange, but I repeated the benchmark about 3x just to
verify it really behaves like this. Maybe I did some stupid mistake and
the results are useless, or maybe it needs to be more complex (e.g. the
conditions must not be exactly the same). So if someone could rerun the
benchmark and review it, that'd be nice.

Judging the cost/benefit ratio is a bit tricky. We need to identify the
cases where additional planning complexity makes it measurably slower,
without getting better performance at execution. And then we need to
somehow argue whether those cases are frequent enough or not.

ISTM that the worst case would be a data set with many partial indexes,
that however don't allow IOS. And the amount of data would have to be
small, so that the queries don't take too much time (which would make
the additional planning time noise).

However that was the idea of the benchmark, and I got no difference.

regards
Tomas

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: PATCH: index-only scans with partial indexes

От

Simon Riggs

Дата:

05 сентября 2015 г., 12:25:14

On 4 September 2015 at 22:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

A summary of 100 EXPLAIN timings looks like this:

master A B C D E F
-------------------------------------------------------------------------
min 0.10 0.10 0.30 0.29 0.66 0.23
max 1.07 1.00 2.13 1.98 4.52 1.59
median 0.49 0.52 0.31 0.33 0.68 1.12
average 0.43 0.35 0.62 0.49 1.01 0.89

What are these? Times? in ms?

However that was the idea of the benchmark, and I got no difference.

Please explain what this means and your conclusion, so its clear. That way we can either reject the patch or commit it. Thanks

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

05 сентября 2015 г., 20:17:52

Hi,

On 09/05/2015 10:53 AM, Simon Riggs wrote:
> On 4 September 2015 at 22:03, Tomas Vondra <tomas.vondra@2ndquadrant.com
> <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
>     A summary of 100 EXPLAIN timings looks like this:
>
>
>     master       A          B          C          D          E          F
>     -------------------------------------------------------------------------
>     min        0.10       0.10       0.30       0.29       0.66       0.23
>     max        1.07       1.00       2.13       1.98       4.52       1.59
>     median     0.49       0.52       0.31       0.33       0.68       1.12
>     average    0.43       0.35       0.62       0.49       1.01       0.89
>
>
> What are these? Times? in ms?

Yes, those are planning times in milliseconds. I've been thinking about 
possible issues in the benchmark, and I ended up with two main suspects:
  (a) environment - VM running on a laptop. thus quite noisy and      subject to various sources of overhead,
power-management,etc.
 
  (b) time measured using \timing in psql (by running EXPLAIN), so      probably influenced by formatting/transfer

So I reran the benchmark on a different machine (bare metal, pretty much 
no noise in the results), and measured the planning time using EXPLAIN 
ANALYZE (Planning Time). And I got this (milliseconds):
           A         B         C         D         E         F
----------------------------------------------------------------- min      0.04      0.04      0.11      0.10      0.37
    0.12 max      0.10      0.10      0.92      0.92      1.62      1.23 median   0.04      0.04      0.11      0.11
 0.37      0.13 average  0.04      0.04      0.11      0.11      0.38      0.14
 
           A         B         C         D         E         F
----------------------------------------------------------------- min      0.04      0.04      0.11      0.11      0.38
    0.13 max      0.10      0.10      0.92      0.94      1.64      1.21 median   0.04      0.04      0.11      0.11
 0.39      0.13 average  0.04      0.04      0.11      0.12      0.40      0.14
 

So much lower numbers (better CPU, no virtualization, etc.), but 
otherwise exactly the same conclusion - no overhead compared to master.

I think of three ways how to make the checks more expensive:
   (a) using more indexes
       The current benchmark already uses 40 indexes (and I've tried       with 100), and we've seen no impact at all.
Addingmore indexes       will eventually show some overhead, but the number of indexes       will be very high - I
doubtanyone has a table with hundreds of       partial indexes on a it.
 
   (b) using more complex index predicates
       I expect the predicate_implied_by() call to get more expensive       for more complex predicates. I however
believethat's quite       uncommon case - vast majority of index predicates that I've seen       use just a single
equalityclause.
 
   (c) using more complex queries (more WHERE conditions)
       Having more complex WHERE clauses seems quite plausible, though,       so I've decided to try it. Instead of the
simplequery used       before:
 
           select a from t where b >= 100 and b <= 200;
       I've used a query with a bunch of other conditions:
           select a from t where b >= 100 and b <= 200                             and c >= 100 and c <= 200
                and d >= 100 and d <= 200                             and a >= 100 and a <= 100;
 
       And indeed, this made a (tiny) difference - on the master, the       planning was 0.50 ms on average, while with
thepatch it was       0.55. But 0.05 ms is just barely above noise, even on this HW.
 
       Of course, this only impacts the case with partial indexes, all       the other cases were exactly the same with
andwithout patch.
 
>
>     However that was the idea of the benchmark, and I got no difference.
>
>
> Please explain what this means and your conclusion, so its clear. That
> way we can either reject the patch or commit it. Thanks

That means I've been unable to measure any significant overhead of the 
patch. There certainly are extreme cases where this patch might make the 
planning noticeably slower, but I believe those are rather artificial, 
and certainly wouldn't expect them in databases where a tiny increase of 
planning time would be a problem.

This benchmark however only looked at the planning overhead, but we 
should weight that with respect to possible gains. And IOS is a great 
optimization - it's not uncommon to see 2-3x improvements on databases 
that fit into RAM, and order of magnitude improvements on large 
databases (thanks to eliminating the random I/O when accessing the heap).

So my opinion is that we should commit this patch.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kevin Grittner

Дата:

13 сентября 2015 г., 21:03:34

Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> That means I've been unable to measure any significant overhead
> of the patch.

I've run a lot of benchmarks, and with anything resembling a common
query the differences in planning time are lost in the noise.  (I
didn't create a better example than Tomas of where a lot of indexes
cause a minimal increase in planning time.)
The test environment is a "bare iron" machine with:
1 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (4 cores; 8 HW threads)16GB DDR3 RAM2 1TB drives in RAID 1ubuntu 14.04 LTS
64-bitvariouscheckouts from master, most recently a7212a99no cassert, default optimizations

As one example, to get a heap bigger than RAM I set up like this:

drop table if exists t;
create table t (a int not null, b int not null, x text not null);
insert into t select i, i, repeat(md5(i::text), 50)   from generate_series(1,10000000) s(i);
vacuum freeze t;
checkpoint;

I ran one-index tests like this:

create index t_b_partial on t(b) where a > 1000 and a < 2000;
vacuum analyze t;
checkpoint;
explain (analyze, buffers, verbose) select count(b) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose) select count(a) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose) select count(*) from t where a > 1000 and a < 2000;

... then two-index tests like this:

create index t_b_a_partial on t(b, a) where a > 1000 and a < 2000;
vacuum analyze t;
checkpoint;
explain (analyze, buffers, verbose) select count(b) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose) select count(a) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose) select count(*) from t where a > 1000 and a < 2000;

All queries were run 5 times and (to minimize stray slowdowns from
other sources on this desktop machine) I took the minimum plan time
and minimum execution time.  (My browser and other optional
processes were stopped to also minimize noise, but the results
still had more noise than I would prefer.)

master - single index
---------------------
Planning time: 0.078 ms
Execution time: 0.544 ms

Planning time: 0.079 ms
Execution time: 0.533 ms

Planning time: 0.066 ms
Execution time: 0.491 ms

master - both indexes
---------------------
Planning time: 0.080 ms
Execution time: 0.396 ms

Planning time: 0.076 ms
Execution time: 0.373 ms

Planning time: 0.056 ms
Execution time: 0.275 ms

patched - single index
----------------------
Planning time: 0.032 ms
Execution time: 0.136 ms

Planning time: 0.079 ms
Execution time: 0.537 ms

Planning time: 0.050 ms
Execution time: 0.213 ms

patched - both indexes
----------------------
Planning time: 0.100 ms
Execution time: 0.373 ms

Planning time: 0.067 ms
Execution time: 0.251 ms

Planning time: 0.065 ms
Execution time: 0.240 ms

In my view, the most disappointing thing about the patch is that
when both indexes are present, it doesn't use the narrower one.  If
*only* the narrower index is present, it runs the index-only scan
using that index for count(b) and count(*), which is faster.  Can
we wrangle this patch into making a better choice among available
index-only scans?

It also seems disappointing that we don't recognize that
count(columnname) could be treated as a synonym for count(*) if
columnname is NOT NULL, but that doesn't seem like material for
this patch.

Benchmarking took so much time I did not get to a close review of
the code changes.  :-(  Based on just the test results, it appears
to me that the patch as it stands would be a net win for the vast
majority of workloads where it would make any noticeable difference,
and I didn't manage to create any big downside.  I would like to
see whether we can't improve the choice of partial index when there
are multiple possibilities -- it seems quite surprising to see
identical estimates for indexes of different column counts and key
widths, and to see the wider index chosen when the narrow one is
clearly (and predictably) faster.

I am changing this to Waiting on Author.

I will be on vacation without Internet access for the next 15 days,
so hopefully someone else can have a look when a new version is
posted.  If it's still open I'll have a look when I get back.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

14 сентября 2015 г., 00:21:43

Hi,

On 09/13/2015 08:03 PM, Kevin Grittner wrote:
>
> In my view, the most disappointing thing about the patch is that
> when both indexes are present, it doesn't use the narrower one.  If
> *only* the narrower index is present, it runs the index-only scan
> using that index for count(b) and count(*), which is faster.  Can
> we wrangle this patch into making a better choice among available
> index-only scans?

That's indeed strange, but after poking into that for a while, it seems 
rather like a costing issue. Let me demonstrate:

create table t (a int, b int);
insert into t select i,i from generate_series(1,1000000) s(i);

create index idx1 on t (a)   where b between 300000 and 600000;
create index idx2 on t (a,b) where b between 300000 and 600000;

vacuum t;
analyze t;

explain select a from t where b between 300000 and 600000;
                            QUERY PLAN
--------------------------------------------------------------------------- Index Only Scan using idx2 on t
(cost=0.42..9236.88rows=297823 width=4)   Index Cond: ((b >= 300000) AND (b <= 600000))
 
(2 rows)

drop index idx2;
                            QUERY PLAN
--------------------------------------------------------------------------- Index Only Scan using idx1 on t
(cost=0.42..9236.88rows=297823 width=4)
 
(1 row)

Now, both plans are index only scans, but the first one has Index Cond 
and the other one does not!

I've put a bunch of logging into cost_index(), and turns out that while 
the final cost is *exactly* the same, it's most likely by chance. After 
we call amcostestimate, we get these two results:

idx1: indexStartupCost=0.422500 indexTotalCost=4769.537500      indexSelectivity=0.297823 indexCorrelation=1.000000

idx2: indexStartupCost=0.422500 indexTotalCost=6258.652500      indexSelectivity=0.297823 indexCorrelation=0.750000

So amcostestimate does make a difference, and we get

idx1: run_cost = 4769.115000
idx2: run_cost = 6258.230000

and then for both indexes
    tuples_fetched=297823.000000    loop_count=1.000000    pages_fetched = 0.000000

but then we do
    run_cost += cpu_per_tuple * tuples_fetched;

and we end up with
    run_cost = 9236.460000

for both indexes. How's that possible? Number of tuples fetched is 
exactly the same for both indexes (297823), so clearly cpu_per_tuple 
must be different. That however seems a bit strange, because the only 
difference between the indexes is the additional column, and the 
condition should be covered by the index predicate ...

It seems that the problem is related to this:
    qpquals        = extract_nonindex_conditions(baserel->baserestrictinfo,
path->indexquals);

while the "larger" index on (a,b) gets
    path->indexquals=(b BETWEEN 300000 AND 600000)    qpquals=NIL

the "smaller" index on (a) gets
    path->indexquals=NIL    qpquals=(b BETWEEN 300000 AND 600000)

And so the larger index gets qpqual_cost=0, the smaller one gets 
qpqual_cost=0.005, and so cpu_per_tuple is either 0.01 or 0.015.

Which is exactly the difference between costs from amcostestimate

idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
idx2: 6258.230000 + 0.010 * 297823 = 9236.460000

Sppoky! Although it seems like a mere coincidence, thanks to the nice 
round numbers of tuples in the table, and lucky choice of two conditions.

For example after replacing the BETWEEN condition (which is actually two 
conditions) with a single one (b<300000) - both in the indexes and 
query, I get this:
                           QUERY PLAN
--------------------------------------------------------------------------- Index Only Scan using idx2 on t
(cost=0.42..8541.25rows=299507 width=4)   Index Cond: (b < 300000)
 
(2 rows)

drop index idx2;
                           QUERY PLAN
--------------------------------------------------------------------------- Index Only Scan using idx1 on t
(cost=0.42..8541.43rows=299507 width=4)
 
(1 row)

The plans are not costed exactly the same anymore (I'm not saying the 
costs are correct - clearly still the 'larger' index was preferred).

It's not bound to index only scan either - after adding another column 
to the table, and requesting it from the query (so preventing IOS), I 
get exactly the same issues.

I really wonder why we get different path->indexquals for those indexes, 
because that's the root of the issue here. Any ideas?

>
> It also seems disappointing that we don't recognize that
> count(columnname) could be treated as a synonym for count(*) if
> columnname is NOT NULL, but that doesn't seem like material for
> this patch.

Yeah, that's really not what this patch deals with.

>
> Benchmarking took so much time I did not get to a close review of
> the code changes.  :-(  Based on just the test results, it appears
> to me that the patch as it stands would be a net win for the vast
> majority of workloads where it would make any noticeable difference,
> and I didn't manage to create any big downside.  I would like to
> see whether we can't improve the choice of partial index when there
> are multiple possibilities -- it seems quite surprising to see
> identical estimates for indexes of different column counts and key
> widths, and to see the wider index chosen when the narrow one is
> clearly (and predictably) faster.
>
> I am changing this to Waiting on Author.
>
> I will be on vacation without Internet access for the next 15 days,
> so hopefully someone else can have a look when a new version is
> posted.  If it's still open I'll have a look when I get back.

Thanks for the feedback!

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

14 сентября 2015 г., 10:36:08

Hi,

At Sun, 13 Sep 2015 23:21:30 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<55F5E8DA.8080303@2ndquadrant.com>
> That's indeed strange, but after poking into that for a while, it
> seems rather like a costing issue. Let me demonstrate:
...
> Now, both plans are index only scans, but the first one has Index Cond
> and the other one does not!

The seemingly removed IndexCond qual is counted as non-index
quals at the last in cost_index. The quals that the partial index
implies should be ignored on cost_estimation.

> I've put a bunch of logging into cost_index(), and turns out that
> while the final cost is *exactly* the same, it's most likely by
> chance. After we call amcostestimate, we get these two results:

So it is *not by chance* but a stable behavior defined by
algorithm.

> It seems that the problem is related to this:
> 
>     qpquals
>         = extract_nonindex_conditions(baserel->baserestrictinfo,
>                                       path->indexquals);
> 
> while the "larger" index on (a,b) gets
> 
>     path->indexquals=(b BETWEEN 300000 AND 600000)
>     qpquals=NIL
> 
> the "smaller" index on (a) gets
> 
>     path->indexquals=NIL
>     qpquals=(b BETWEEN 300000 AND 600000)
> 
> And so the larger index gets qpqual_cost=0, the smaller one gets
> qpqual_cost=0.005, and so cpu_per_tuple is either 0.01 or 0.015.
> 
> Which is exactly the difference between costs from amcostestimate
> 
> idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
> idx2: 6258.230000 + 0.010 * 297823 = 9236.460000

These calculations are exactly right, but you overlooked the
breakedown of indexTotalCost for idx2.

> Sppoky! Although it seems like a mere coincidence, thanks to the nice
> round numbers of tuples in the table, and lucky choice of two
> conditions.

As said above, it is not a conincidence. The exactly same
calculation about baserestrictinfo is simply calculated in
different places, cost_index for the former and
btcostestiamte(genericcostestimate) for the latter.

We should properly ignore or remove the implicitly-applied quals
for partial indexes on cost estimation.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

14 сентября 2015 г., 11:00:38

On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote:
> Hi,
,,,
>> Which is exactly the difference between costs from amcostestimate
>>
>> idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
>> idx2: 6258.230000 + 0.010 * 297823 = 9236.460000
>
> These calculations are exactly right, but you overlooked the
> breakedown of indexTotalCost for idx2.
>
>> Sppoky! Although it seems like a mere coincidence, thanks to the nice
>> round numbers of tuples in the table, and lucky choice of two
>> conditions.
>
> As said above, it is not a conincidence. The exactly same
> calculation about baserestrictinfo is simply calculated in
> different places, cost_index for the former and
> btcostestiamte(genericcostestimate) for the latter.

By "coincidence" I meant that we happened to choose such a number of 
conditions in the index predicate & query that this perfect match is 
possible. Apparently there are two places that manipulate the costs and 
in this particular case happen to perfectly compensate the effects.

As demonstrated by the example with a single condition, the costs may 
actually differ for different numbers of clauses (e.g. using a single 
clause makes the wider index - unexpectedly - cheaper).

>
> We should properly ignore or remove the implicitly-applied quals
> for partial indexes on cost estimation.

Probably. So far I've traced the difference to build_index_paths() where 
we build index_clauses by iterating over index columns - the smaller 
index does not have the column from the predicate, so we don't add the 
clause. I'm not particularly familiar with this part of the code, so I 
wonder where's the best place to fix this, though.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

14 сентября 2015 г., 13:09:16

Hi, this looks to be a bug of cost_index(). The attached patch
would fix that.

=====
The following part in cost_index,

> cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
> 
> run_cost += cpu_per_tuple * tuples_fetched;

Adds, *cpu_tuple_cost* (which is 0.01) + qpqual_cost.per_tuple
(0.0025) per tuple even they are index tuples. On the other hand
getnericcostestimate adds the following value for the same deed.

> indexTotalCost += numIndexTuples * num_sa_scans * (cpu_index_tuple_cost + qual_op_cost);

cpu_index_tuple_cost is 0.005, just a half of cpu_tuple cost as
default. I think this should be the culprit of the difference.

For confirmation, setting cpu_tuple_cost to 0.05 to equate with
cpu_index_tuple_cost and the oppisit makes the estimate for both
indexes the same value.


set cpu_tuple_cost to 0.005;
explain select a from t where b < 300000;                               QUERY PLAN                                 
---------------------------------------------------------------------------Index Only Scan using idx2 on t
(cost=0.42..7022.06rows=297876 width=4)  Index Cond: (b < 300000)
 
(2 rows)

explain select a from t where b < 300000;                               QUERY PLAN                                 
---------------------------------------------------------------------------Index Only Scan using idx1 on t
(cost=0.42..7022.66rows=297876 width=4)
 
(1 row)

This should be a bug.  The attached patch would fix this and
perhaps costs for all of your examples should match except for
errors of double precision. I think it is enough since
IndexOnlyScan may not have quals on columns out of the index in
focus so qpquals should be index quals.

regards,


At Mon, 14 Sep 2015 10:00:24 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<55F67E98.5050904@2ndquadrant.com>
> On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote:
> > Hi,
> ,,,
> >> Which is exactly the difference between costs from amcostestimate
> >>
> >> idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
> >> idx2: 6258.230000 + 0.010 * 297823 = 9236.460000
> >
> > These calculations are exactly right, but you overlooked the
> > breakedown of indexTotalCost for idx2.
> >
> >> Sppoky! Although it seems like a mere coincidence, thanks to the nice
> >> round numbers of tuples in the table, and lucky choice of two
> >> conditions.
> >
> > As said above, it is not a conincidence. The exactly same
> > calculation about baserestrictinfo is simply calculated in
> > different places, cost_index for the former and
> > btcostestiamte(genericcostestimate) for the latter.
> 
> By "coincidence" I meant that we happened to choose such a number of
> conditions in the index predicate & query that this perfect match is
> possible. Apparently there are two places that manipulate the costs
> and in this particular case happen to perfectly compensate the
> effects.

Ok, I understood.

> As demonstrated by the example with a single condition, the costs may
> actually differ for different numbers of clauses (e.g. using a single
> clause makes the wider index - unexpectedly - cheaper).
> 
> >
> > We should properly ignore or remove the implicitly-applied quals
> > for partial indexes on cost estimation.
> 
> Probably. So far I've traced the difference to build_index_paths()
> where we build index_clauses by iterating over index columns - the
> smaller index does not have the column from the predicate, so we don't
> add the clause. I'm not particularly familiar with this part of the
> code, so I wonder where's the best place to fix this, though.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d107d76..d354dc2 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -516,7 +516,12 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count)    cost_qual_eval(&qpqual_cost,
qpquals,root);    startup_cost += qpqual_cost.startup;
 
-    cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+    /* Indexonly scan applies this qual on index tuples */
+    if (indexonly)
+        cpu_per_tuple = cpu_index_tuple_cost + qpqual_cost.per_tuple;
+    else
+        cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;    run_cost += cpu_per_tuple * tuples_fetched;

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

14 сентября 2015 г., 13:15:45

Sorry.

> Hi, this looks to be a bug of cost_index(). The attached patch
> would fix that.

No, that's wrong. please forget the patch. The qual in qpquals
should be indexquals which is excluded because it is not
necessary to be applied. The right way would be remove the cost
for qpqual in cost_index().

> =====
> The following part in cost_index,
> 
> > cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
> > 
> > run_cost += cpu_per_tuple * tuples_fetched;
> 
> Adds, *cpu_tuple_cost* (which is 0.01) + qpqual_cost.per_tuple
> (0.0025) per tuple even they are index tuples. On the other hand
> getnericcostestimate adds the following value for the same deed.
> 
> > indexTotalCost += numIndexTuples * num_sa_scans * (cpu_index_tuple_cost + qual_op_cost);
> 
> cpu_index_tuple_cost is 0.005, just a half of cpu_tuple cost as
> default. I think this should be the culprit of the difference.
> 
> For confirmation, setting cpu_tuple_cost to 0.05 to equate with
> cpu_index_tuple_cost and the oppisit makes the estimate for both
> indexes the same value.
> 
> 
> set cpu_tuple_cost to 0.005;
> explain select a from t where b < 300000;
>                                 QUERY PLAN                                 
> ---------------------------------------------------------------------------
>  Index Only Scan using idx2 on t  (cost=0.42..7022.06 rows=297876 width=4)
>    Index Cond: (b < 300000)
> (2 rows)
> 
> explain select a from t where b < 300000;
>                                 QUERY PLAN                                 
> ---------------------------------------------------------------------------
>  Index Only Scan using idx1 on t  (cost=0.42..7022.66 rows=297876 width=4)
> (1 row)
> 
> This should be a bug.  The attached patch would fix this and
> perhaps costs for all of your examples should match except for
> errors of double precision. I think it is enough since
> IndexOnlyScan may not have quals on columns out of the index in
> focus so qpquals should be index quals.
> 
> regards,
> 
> 
> At Mon, 14 Sep 2015 10:00:24 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<55F67E98.5050904@2ndquadrant.com>
> > On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote:
> > > Hi,
> > ,,,
> > >> Which is exactly the difference between costs from amcostestimate
> > >>
> > >> idx1: 4769.115000 + 0.015 * 297823 = 9236.460000
> > >> idx2: 6258.230000 + 0.010 * 297823 = 9236.460000
> > >
> > > These calculations are exactly right, but you overlooked the
> > > breakedown of indexTotalCost for idx2.
> > >
> > >> Sppoky! Although it seems like a mere coincidence, thanks to the nice
> > >> round numbers of tuples in the table, and lucky choice of two
> > >> conditions.
> > >
> > > As said above, it is not a conincidence. The exactly same
> > > calculation about baserestrictinfo is simply calculated in
> > > different places, cost_index for the former and
> > > btcostestiamte(genericcostestimate) for the latter.
> > 
> > By "coincidence" I meant that we happened to choose such a number of
> > conditions in the index predicate & query that this perfect match is
> > possible. Apparently there are two places that manipulate the costs
> > and in this particular case happen to perfectly compensate the
> > effects.
> 
> Ok, I understood.
> 
> > As demonstrated by the example with a single condition, the costs may
> > actually differ for different numbers of clauses (e.g. using a single
> > clause makes the wider index - unexpectedly - cheaper).
> > 
> > >
> > > We should properly ignore or remove the implicitly-applied quals
> > > for partial indexes on cost estimation.
> > 
> > Probably. So far I've traced the difference to build_index_paths()
> > where we build index_clauses by iterating over index columns - the
> > smaller index does not have the column from the predicate, so we don't
> > add the clause. I'm not particularly familiar with this part of the
> > code, so I wonder where's the best place to fix this, though.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

14 сентября 2015 г., 13:51:45

I rethinked on this from the first.

> Sorry.
> 
> > Hi, this looks to be a bug of cost_index(). The attached patch
> > would fix that.
> 
> No, that's wrong. please forget the patch. The qual in qpquals
> should be indexquals which is excluded because it is not
> necessary to be applied. The right way would be remove the cost
> for qpqual in cost_index().

Your patch allows index only scan even if a qual contains
non-index column when the qual can be removed by implication from
index predicates.

So the 'implied' clauses is not needed ever after. It should be
excluded from cost estimation and it is not needed on execution
even if index only scan is found not to be doable finally.

So the implicit quals may be removed on building index paths but
I think check_index_only is not the place.

Removing implied quals from index quals is not only for index
*only* scan so the place for removing such quals is in
build_index_paths, in the loop of step 1. After removing the
quals there, check_index_only will naturally give disired result.

# I remember that I have tried the same or similar thing. I don't
# recall what made me give up then.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

14 сентября 2015 г., 14:28:08

On 09/14/2015 12:51 PM, Kyotaro HORIGUCHI wrote:
> I rethinked on this from the first.
>
>> Sorry.
>>
>>> Hi, this looks to be a bug of cost_index(). The attached patch
>>> would fix that.
>>
>> No, that's wrong. please forget the patch. The qual in qpquals
>> should be indexquals which is excluded because it is not
>> necessary to be applied. The right way would be remove the cost
>> for qpqual in cost_index().
>
> Your patch allows index only scan even if a qual contains
> non-index column when the qual can be removed by implication from
> index predicates.
>
> So the 'implied' clauses is not needed ever after. It should be
> excluded from cost estimation and it is not needed on execution
> even if index only scan is found not to be doable finally.
>
> So the implicit quals may be removed on building index paths but
> I think check_index_only is not the place.
>
> Removing implied quals from index quals is not only for index
> *only* scan so the place for removing such quals is in
> build_index_paths, in the loop of step 1. After removing the
> quals there, check_index_only will naturally give disired result.
>
> # I remember that I have tried the same or similar thing. I don't
> # recall what made me give up then.

I don't think this is particularly related to the patch, because some of 
the anomalies can be observed even on master. For example, let's force 
the index scans to be non-IOS by requesting another column from the heap:
  create table t (a int, b int, c int);  insert into t select i,i,i from generate_series(1,1000000) s(i);
  create index idx1 on t (a)   where b between 300000 and 600000;  create index idx2 on t (a,b) where b between 300000
and600000;

  analyze t;  vacuum t;

The indexes have exactly the same size (thanks to alignment of 
IndexTuples), and should have exactly the same statistics:

test=# \di+                       List of relations Schema | Name | Type  | Owner | Table |  Size   | Description
--------+------+-------+-------+-------+---------+------------- public | idx1 | index | user  | t     | 6600 kB |
public| idx2 | index | user  | t     | 6600 kB |

(2 rows)

Now, let's see the query reading column 'c' (forcing heap fetches)

explain select c from t where b between 300000 and 600000;                              QUERY PLAN
----------------------------------------------------------------------- Index Scan using idx1 on t
(cost=0.42..10945.99rows=300971 width=4)

(1 row)

drop index idx1;
set enable_bitmapscan = off;

explain select c from t where b between 300000 and 600000;                              QUERY PLAN
----------------------------------------------------------------------- Index Scan using idx2 on t
(cost=0.42..19688.08rows=300971 width=4)   Index Cond: ((b >= 300000) AND (b <= 600000))

(2 rows)

I claim that either both or none of the indexes should use "Index Cond".

This is exactly the same reason that lead to the strange behavior after 
applying the patch, but in this case the heap access actually introduces 
some additional cost so the issue is not that obvious.

But in reality the costs should be pretty much exactly the same - the 
indexes have exactly the same size, statistics, selectivity etc.

Also, the plan difference suggests this is not merely a costing issue, 
because while with idx1 (first plan) it was correctly detected we don't 
need to evaluate the condition on the partial index, on idx2 that's not 
true and we'll waste time doing that. So we probably can't just tweak 
the costing a bit - this probably needs to be addressed when actually 
building the index path.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

17 сентября 2015 г., 13:46:16

Hi, sorry in advance for hardly readable long descriptions..

At Mon, 14 Sep 2015 13:27:47 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<55F6AF33.8000206@2ndquadrant.com>
> I don't think this is particularly related to the patch, because some
> of the anomalies can be observed even on master. For example, let's
> force the index scans to be non-IOS by requesting another column from
> the heap:
...
> I claim that either both or none of the indexes should use "Index
> Cond".

Perhaps I understood your point and I think I understood this.

Planner dicides whether to use the partial index far before
creating specific paths, when measuring baserel sizes. If some
partial index is implied by the restriction list,
check_partial_indexes() marks the index as predOk, which means
that the index is usable for the baserel scan even if the
restriction clauses doesn't match the index columns.

Then create_index_paths allows to generating paths for partial
indexes with predOK. Index clauses of the created paths for the
partial indexes don't contain clauses doesn't match the index
columns. It is empty for your first example and it is the same
with restrictinfo for the second.

Finally, create_indexscan_plan strips baserestriction caluses
implied by index predicate in addtion to index conditions. So
both of your example has no filter conditions.

The following example would show you the result of the above
steps.

explain select c from t where b between 300000 + 1 and 600000 and c = 3;                           QUERY PLAN
                
 
------------------------------------------------------------------Index Scan using idx1 on t  (cost=0.42..11665.77
rows=1width=4)  Filter: ((b >= 300001) AND (c = 3))
 

The index predicate (b >= 300000 AND b <= 600000) implies the
restrction (b >= 300001 AND b <= 600000) so idx1 is allowed to be
used, then conversely the restriction b >= 300001 is implied by
the index predicate b >= 300000 so it is not shown as Index Cond
and the other two are not, so they are shown and executed as
Filter.

But regardless of whether stripped as implied conditions or not
at planning phase, the cost for all clauses that don't match the
index columns are added when creating index paths. That will be
one of the cause of cost error.

> This is exactly the same reason that lead to the strange behavior
> after applying the patch, but in this case the heap access actually
> introduces some additional cost so the issue is not that obvious.

So you're right on the point that check_index_only is doing
wrong. It should properly ignore restrictions implied by index
predicates as your patch is doing. But cost_index doesn't know
that some nonindex-conditions of baserestrictinfo is finally
useless, and it is assuming that nonindex conditions are always
applied on heap tuples.


After all, what should be done to properly ignore useless
conditions would be,
1. Make create_index_paths() to make the list of restrict   clauses which are implied by the index predicate of the
index  in focus. The list would be stored as a member in   IndexOptInfo. Then create index clauses excluding the listed
 clauses and call get_index_paths using them. Modify   check_index_only() to ignore the listed clauses when pulling
varattnos.This is similar but different a bit to what I said   in the previous mail.
 
2. Pass the listed clauses to generated IndexPath.
3. Modify extract_nonindex_conditions to be called with the   exclusion list and the cluases are exluded from the
resultof   the function.
 
4. Make create_indexscan_plan to extract qpqual excluding the   exclusion list.

The same result could be obtained by more smarter way but the
steps above will archive right decision on whether to do index
only scan on partial index and correct cost estimate propery
ignoring unused restrictions.

Does this make sense for you?

> But in reality the costs should be pretty much exactly the same - the
> indexes have exactly the same size, statistics, selectivity etc.
> 
> Also, the plan difference suggests this is not merely a costing issue,
> because while with idx1 (first plan) it was correctly detected we
> don't need to evaluate the condition on the partial index, on idx2
> that's not true and we'll waste time doing that. So we probably can't
> just tweak the costing a bit - this probably needs to be addressed
> when actually building the index path.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

17 сентября 2015 г., 18:40:35

Hello Horiguchi-san,

On 09/17/2015 12:45 PM, Kyotaro HORIGUCHI wrote:
>
> After all, what should be done to properly ignore useless
> conditions would be,
>
>   1. Make create_index_paths() to make the list of restrict
>      clauses which are implied by the index predicate of the index
>      in focus. The list would be stored as a member in
>      IndexOptInfo. Then create index clauses excluding the listed
>      clauses and call get_index_paths using them. Modify
>      check_index_only() to ignore the listed clauses when pulling
>      varattnos. This is similar but different a bit to what I said
>      in the previous mail.
>
>   2. Pass the listed clauses to generated IndexPath.
>
>   3. Modify extract_nonindex_conditions to be called with the
>      exclusion list and the cluases are exluded from the result of
>      the function.
>
>   4. Make create_indexscan_plan to extract qpqual excluding the
>      exclusion list.
>
> The same result could be obtained by more smarter way but the
> steps above will archive right decision on whether to do index
> only scan on partial index and correct cost estimate propery
> ignoring unused restrictions.
>
> Does this make sense for you?

Yes, this seems sane. I've been poking at this a bit too, and I came to 
the same plan in general, except that I think it's better to build list 
of clauses that are *not* implied by the index, because that's what we 
need both in cost_index and check_index_only.

It also seems to me that this change (arguably a bug fix) should pretty 
much make the original patch irrelevant, because check_index_only can 
simply walk over the new list.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

18 сентября 2015 г., 04:46:39

Hello,

At Thu, 17 Sep 2015 17:40:27 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<55FADEEB.4000907@2ndquadrant.com>
> Yes, this seems sane. I've been poking at this a bit too, and I came
> to the same plan in general, except that I think it's better to build
> list of clauses that are *not* implied by the index, because that's
> what we need both in cost_index and check_index_only.

I intended to isolate IndexOptInfo from belonging RelOptInfo but
the exclusion list also bonds them tightly, and one IndexOptInfo
belongs to only one RelOptInfo so no need to isolate. So
not-implied-restrictclauses in IndexOptInfo would be preferable.

> It also seems to me that this change (arguably a bug fix) should
> pretty much make the original patch irrelevant, because
> check_index_only can simply walk over the new list.

Yeah. This seems to be a bug irrelevant to your index-only-scan
ptch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

26 сентября 2015 г., 14:29:00

Hi,

On 09/18/2015 03:46 AM, Kyotaro HORIGUCHI wrote:
> Hello,
>
> At Thu, 17 Sep 2015 17:40:27 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<55FADEEB.4000907@2ndquadrant.com>
>> Yes, this seems sane. I've been poking at this a bit too, and I came
>> to the same plan in general, except that I think it's better to build
>> list of clauses that are *not* implied by the index, because that's
>> what we need both in cost_index and check_index_only.
>
> I intended to isolate IndexOptInfo from belonging RelOptInfo but
> the exclusion list also bonds them tightly, and one IndexOptInfo
> belongs to only one RelOptInfo so no need to isolate. So
> not-implied-restrictclauses in IndexOptInfo would be preferable.
>
>> It also seems to me that this change (arguably a bug fix) should
>> pretty much make the original patch irrelevant, because
>> check_index_only can simply walk over the new list.
>
> Yeah. This seems to be a bug irrelevant to your index-only-scan
> ptch.

Attached is a patch that makes this work correctly, I believe. The only
difference compared to the v1 of the patch is this piece of code in
match_clause_to_index (indexpath.c):

if ((index->indpred != NIL) &&
     (predicate_implied_by(lappend(NIL, rinfo->clause), index->indpred)))
     continue;

which effectively says that we should ignore clauses implied by the
index predicate.

This fixes the way we build IndexClauseSet for the two indexes -
originally we'd add the implied clause on the large index but not on the
small one (because it does not have the column referenced in the
condition). And as far as I can say, this also fixes all the costing
issues that I've described because cost_index sees the same thing for
both indexes.

It's also early enough to fix the actual path, so EXPLAIN shows the same
thing for both indexes (so no "Index Cond" present for only one of the
indexes).

I've originally tried to fix this in extract_nonindex_conditions, but
that's way too late - it can only fix the costing, not the path.

Arguably this part of the patch is a bugfix of an issue present on master.

The patch does not change the check_index_only implementation - it still
needs to check the clauses, just like in v1 of the patch. To make this
re-check unnecessary, we'd have to stick the remaining clauses
somewhere, so that check_index_only can use only the filtered list
(instead of walking through the complete list of restrictions).

Or perhaps we can do the check in match_clause_to_index, pretty much for
free? If the clause is not implied by the index predicate (which we know
thanks to the fix), and we don't assign it to any of the index columns,
it means we can'd use IOS, no?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

partial-index-only-scan-v2.patch

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

26 сентября 2015 г., 19:00:47

Hi,

On 09/26/2015 01:28 PM, Tomas Vondra wrote:

> The patch does not change the check_index_only implementation - it
> still needs to check the clauses, just like in v1 of the patch. To
> make this re-check unnecessary, we'd have to stick the remaining
> clauses somewhere, so that check_index_only can use only the filtered
> list (instead of walking through the complete list of restrictions).

After thinking about this a bit more, I realized we already have a good
place for keeping this list - IndexClauseSet. So I've done that, and
removed most of the code from check_index_only - it still needs to
decide whether to use the full list of restrictions (regular indexes) or
the filtered list (for partial indexes).

Calling predicate_implied_by in match_clause_to_index makes the check a
bit earlier, compared to the v1 of the patch. So theoretically there
might be cases where we'd interrupt the execution between those places,
and the v1 would not pay the price for the call. But those places are
fairly close together, so I find that unlikely and I've been unable to
reproduce a plausible example of such regression despite trying.

The only regression test this breaks is "aggregates", and I believe
that's actually correct because it changes the plans like this:

        ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2
              Index Cond: (f1 IS NOT NULL)
        ->  Index Only Scan using minmaxtest3i on minmaxtest3
-            Index Cond: (f1 IS NOT NULL)

i.e. it removes the Index Condition from scans of minmaxtest3i, which is
perfectly sensible because the index is defined like this:

create index minmaxtest3i on minmaxtest3(f1) where f1 is not null;

So it's partial index and that condition is implied by the predicate
(it's actually exactly the same).

I haven't fixed those regression tests for now.

>
> Or perhaps we can do the check in match_clause_to_index, pretty much
> for free? If the clause is not implied by the index predicate (which
> we know thanks to the fix), and we don't assign it to any of the
> index columns, it means we can'd use IOS, no?

After thinking about this a bit more, this is clearly nonsense. It fails
on conditions referencing multiple columns (WHERE a=b) which can't be
assigned to a single index column. So the logic would have to be much
more complicated, effectively doing what check_index_only is doing just
a tiny bit later.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

partial-index-only-scan-v3.patch

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

29 сентября 2015 г., 13:27:42

Hi,

At Sat, 26 Sep 2015 18:00:33 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<5606C121.10205@2ndquadrant.com>
> Hi,
> 
> On 09/26/2015 01:28 PM, Tomas Vondra wrote:
> 
> > The patch does not change the check_index_only implementation - it
> > still needs to check the clauses, just like in v1 of the patch. To
> > make this re-check unnecessary, we'd have to stick the remaining
> > clauses somewhere, so that check_index_only can use only the filtered
> > list (instead of walking through the complete list of restrictions).
> 
> After thinking about this a bit more, I realized we already have a
> good place for keeping this list - IndexClauseSet. So I've done that,

The place looks fine but it might be better that rclauses have
baserestrictinfo itself when indpred == NIL. It would make the
semantics of rclauses more consistent.

> and removed most of the code from check_index_only - it still needs to
> decide whether to use the full list of restrictions (regular indexes)
> or the filtered list (for partial indexes).

The change above allows to reduce the modification still left in
check_index_only.

> Calling predicate_implied_by in match_clause_to_index makes the check
> a bit earlier, compared to the v1 of the patch. So theoretically there
> might be cases where we'd interrupt the execution between those
> places, and the v1 would not pay the price for the call. But those
> places are fairly close together, so I find that unlikely and I've
> been unable to reproduce a plausible example of such regression
> despite trying.
> 
> The only regression test this breaks is "aggregates", and I believe
> that's actually correct because it changes the plans like this:
> 
>        ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2
>              Index Cond: (f1 IS NOT NULL)
>        ->  Index Only Scan using minmaxtest3i on minmaxtest3
> -            Index Cond: (f1 IS NOT NULL)
> 
> i.e. it removes the Index Condition from scans of minmaxtest3i, which
> is perfectly sensible because the index is defined like this:
> 
> create index minmaxtest3i on minmaxtest3(f1) where f1 is not null;
> 
> So it's partial index and that condition is implied by the predicate
> (it's actually exactly the same).

Agreed. It looks an unexpected-but-positive product.

> I haven't fixed those regression tests for now.
> 
> >
> > Or perhaps we can do the check in match_clause_to_index, pretty much
> > for free? If the clause is not implied by the index predicate (which
> > we know thanks to the fix), and we don't assign it to any of the
> > index columns, it means we can'd use IOS, no?
> 
> After thinking about this a bit more, this is clearly nonsense. It
> fails on conditions referencing multiple columns (WHERE a=b) which
> can't be assigned to a single index column. So the logic would have to
> be much more complicated, effectively doing what check_index_only is
> doing just a tiny bit later.

cost_index() seems to need to be fixed. It would count excluded
clauses in estimate.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

29 сентября 2015 г., 17:57:11

Hello,

On 09/29/2015 12:27 PM, Kyotaro HORIGUCHI wrote:
> Hi,
>
> At Sat, 26 Sep 2015 18:00:33 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<5606C121.10205@2ndquadrant.com>
>> Hi,
>>
>> On 09/26/2015 01:28 PM, Tomas Vondra wrote:
>>
>>> The patch does not change the check_index_only implementation - it
>>> still needs to check the clauses, just like in v1 of the patch. To
>>> make this re-check unnecessary, we'd have to stick the remaining
>>> clauses somewhere, so that check_index_only can use only the filtered
>>> list (instead of walking through the complete list of restrictions).
>>
>> After thinking about this a bit more, I realized we already have a
>> good place for keeping this list - IndexClauseSet. So I've done that,
>
> The place looks fine but it might be better that rclauses have
> baserestrictinfo itself when indpred == NIL. It would make the
> semantics of rclauses more consistent.
>
>> and removed most of the code from check_index_only - it still needs to
>> decide whether to use the full list of restrictions (regular indexes)
>> or the filtered list (for partial indexes).
>
> The change above allows to reduce the modification still left in
> check_index_only.

I'm not sure I understand what change you suggest? Could you explain?

The change in check_index_only is effectively just (a) comment update
and (b) choice of the right list of clauses. I'd say both changes are 
needed, although (b) could happen outside check_index_only (i.e. we 
could do the check elsewhere). Is that what you mean?

>
> cost_index() seems to need to be fixed. It would count excluded
> clauses in estimate.

Hmm, good point. The problem is that extract_nonindex_conditions uses 
baserel->baserestrictinfo again, i.e. it does not skip the implied 
clauses. So we may either stick the filtered clauses somewhere (for 
example in the IndexPath), teach extract_nonindex_conditions to use 
predicate_implied_by. I'd say the first option is better. Agreed?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

29 сентября 2015 г., 19:56:54

On 09/29/2015 04:57 PM, Tomas Vondra wrote:
> Hello,
>
> On 09/29/2015 12:27 PM, Kyotaro HORIGUCHI wrote:

...

>>
>> cost_index() seems to need to be fixed. It would count excluded
>> clauses in estimate.
>
> Hmm, good point. The problem is that extract_nonindex_conditions uses
> baserel->baserestrictinfo again, i.e. it does not skip the implied
> clauses. So we may either stick the filtered clauses somewhere (for
> example in the IndexPath), teach extract_nonindex_conditions to use
> predicate_implied_by. I'd say the first option is better. Agreed?

And the attached patch v4 should do the trick - it adds 'indexrinfos' to
IndexPath and uses it in cost_index().

CREATE TABLE t AS SELECT i AS a, i AS b, i AS c
                     FROM generate_series(1,1000) s(i);

CREATE INDEX idx ON t(a) WHERE b > 1000;

Then

SELECT a FROM t WHERE b > 1000 AND a < 1000; /* size(qpquals) = 0 */
SELECT a FROM t WHERE b > 1000 AND c < 1000; /* size(qpquals) = 1 */
SELECT a FROM t WHERE c > 1000 AND c < 1000; /* size(qpquals) = 2 */

and so on. Which seems correct I believe.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

partial-index-only-scan-v4.patch

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

30 сентября 2015 г., 04:55:04

Hello,

At Tue, 29 Sep 2015 16:57:03 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<560AA6BF.5030805@2ndquadrant.com>
> >>> The patch does not change the check_index_only implementation - it
> >>> still needs to check the clauses, just like in v1 of the patch. To
> >>> make this re-check unnecessary, we'd have to stick the remaining
> >>> clauses somewhere, so that check_index_only can use only the filtered
> >>> list (instead of walking through the complete list of restrictions).
> >>
> >> After thinking about this a bit more, I realized we already have a
> >> good place for keeping this list - IndexClauseSet. So I've done that,
> >
> > The place looks fine but it might be better that rclauses have
> > baserestrictinfo itself when indpred == NIL. It would make the
> > semantics of rclauses more consistent.
> >
> >> and removed most of the code from check_index_only - it still needs to
> >> decide whether to use the full list of restrictions (regular indexes)
> >> or the filtered list (for partial indexes).
> >
> > The change above allows to reduce the modification still left in
> > check_index_only.
> 
> I'm not sure I understand what change you suggest? Could you explain?
> 
> The change in check_index_only is effectively just (a) comment update
> and (b) choice of the right list of clauses. I'd say both changes are
> needed, although (b) could happen outside check_index_only (i.e. we
> could do the check elsewhere). Is that what you mean?

I meant that the (b) could be done earlier in
match_clause_to_index. Currently clauseset->rclauses stores given
(restrict) clauses which are not implied by indpred *only when*
indpred is present. But it's no problem that rclauses *always*
stores such clauses. rclauses is still storing clauses "not
implied by the index" for the case. It is what I meant by "more
consistent".

The following codelet would be more clearer than my crumsy
words:)

in match_clause_to_index()
>       if (index->indpred != NIL &&
>           predicate_implied_by(list_make1(rinfo->clause), index->indpred))
>               return;  /* ignore clauses implied by index */
>
>       /* track all clauses not implied by indpred */
>       clauseset->rclauses = lappend(clauseset->rclauses, rinfo);
>
>       for (indexcol = 0; indexcol < index->ncolumns; indexcol++)

in check_index_only()
-     * For partial indexes use the filtered clauses, otherwise use the
-     * baserestrictinfo directly for non-partial indexes.
-     */
-    rclauses = (index->indpred != NIL) ? clauses : rel->baserestrictinfo;
-
-    /*     * Add all the attributes used by restriction clauses (only those not     * implied by the index predicate
forpartial indexes).     */
 
-    foreach(lc, rclauses)
+    foreach(lc, clauses)

> > cost_index() seems to need to be fixed. It would count excluded
> > clauses in estimate.
> 
> Hmm, good point. The problem is that extract_nonindex_conditions uses
> baserel->baserestrictinfo again, i.e. it does not skip the implied
> clauses. So we may either stick the filtered clauses somewhere (for
> example in the IndexPath), teach extract_nonindex_conditions to use
> predicate_implied_by. I'd say the first option is better. Agreed?

I'll mention this in a separate reply for the following mail.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

30 сентября 2015 г., 11:30:07

Hello, it looks fine.

> >> cost_index() seems to need to be fixed. It would count excluded
> >> clauses in estimate.
> >
> > Hmm, good point. The problem is that extract_nonindex_conditions uses
> > baserel->baserestrictinfo again, i.e. it does not skip the implied
> > clauses. So we may either stick the filtered clauses somewhere (for
> > example in the IndexPath), teach extract_nonindex_conditions to use
> > predicate_implied_by. I'd say the first option is better. Agreed?
> 
> And the attached patch v4 should do the trick - it adds 'indexrinfos'
> to IndexPath and uses it in cost_index().

It seems to work as expected. It'd also be reasonable to have
"indexrinfos" in IndexPath since cost_index needs it. Loose join
clauses and recheck conditions are not broken.

By the way your comment for indexrinfos is as following,

> * 'indexrinfos' is a list of RestrictInfo nodes from the query's WHERE
> * or JOIN conditions, excluding those implied by the index predicate
> * (if the index is not partial, the list includes all restriction clauses).

But the v4 patch instead leaves it empty for non-partial
indexes:) I prefer to follow this comment because looking the
condition (index->indpred != NIL) for such purpose in
build_index_paths is somewhat uneasy for me. But I don't insist
on that if you choose to avoid useless memory and clock
consumption to construct a list which is not so meaningful for
non-partial indexes (it is almost all cases).

The following comment in match_clause_to_index does not need to
be a 'XXX' comment. What made you to feel to do so? (I rather
feel that it is not necessary at all.)

> * XXX We must do this before trying to match the index to index
> *     columns, because the index predicates may use columns not
> *     used in the index itself.

Anyway some description on rclauses should be added in the
comment for match_clause_to_index, instead of the comments
currently added *within* the function.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

30 сентября 2015 г., 13:55:49

Hello!

On 09/30/2015 10:29 AM, Kyotaro HORIGUCHI wrote:

> By the way your comment for indexrinfos is as following,
>
>> * 'indexrinfos' is a list of RestrictInfo nodes from the query's WHERE
>> * or JOIN conditions, excluding those implied by the index predicate
>> * (if the index is not partial, the list includes all restriction clauses).
>
> But the v4 patch instead leaves it empty for non-partial
> indexes:) I prefer to follow this comment because looking the
> condition (index->indpred != NIL) for such purpose in
> build_index_paths is somewhat uneasy for me. But I don't insist
> on that if you choose to avoid useless memory and clock
> consumption to construct a list which is not so meaningful for
> non-partial indexes (it is almost all cases).

Good point. I think we may simply point indexrinfos to the existing list 
of restrictions in that case - we don't need to copy it. So no 
additional memory / CPU consumption, and it should make the code working 
with it a bit simpler.

>
> The following comment in match_clause_to_index does not need to be a
> 'XXX' comment. What made you to feel to do so? (I rather feel that it
> is not necessary at all.)

I agree that it may not be really necessary. I added the comment because 
initially I've made the check inside the for loop and then spent some 
time investigating why it's not working as expected.

>
>> * XXX We must do this before trying to match the index to index
>> *     columns, because the index predicates may use columns not
>> *     used in the index itself.
>
> Anyway some description on rclauses should be added in the
> comment for match_clause_to_index, instead of the comments
> currently added *within* the function.

OK, will do.


kind regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

01 октября 2015 г., 02:37:00


On 09/30/2015 12:55 PM, Tomas Vondra wrote:
> Hello!
>
> On 09/30/2015 10:29 AM, Kyotaro HORIGUCHI wrote:
>
>> By the way your comment for indexrinfos is as following,
>>
>>> * 'indexrinfos' is a list of RestrictInfo nodes from the query's WHERE
>>> * or JOIN conditions, excluding those implied by the index predicate
>>> * (if the index is not partial, the list includes all restriction
>>> clauses).
>>
>> But the v4 patch instead leaves it empty for non-partial
>> indexes:) I prefer to follow this comment because looking the
>> condition (index->indpred != NIL) for such purpose in
>> build_index_paths is somewhat uneasy for me. But I don't insist
>> on that if you choose to avoid useless memory and clock
>> consumption to construct a list which is not so meaningful for
>> non-partial indexes (it is almost all cases).
>
> Good point. I think we may simply point indexrinfos to the existing list
> of restrictions in that case - we don't need to copy it. So no
> additional memory / CPU consumption, and it should make the code working
> with it a bit simpler.

Attached is v5 of the patch improving the comments (rephrasing, moving
before function etc.).

I also attempted to fix the issue with empty list for non-partial
indexes by simply setting it to rel->baserestrictinfo in
match_restriction_clauses_to_index (for non-partial indexes), but that
results in a bunch of

    ERROR:  variable not found in subplan target list

during "make installcheck". I can't quite wrap my head around why.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

partial-index-only-scan-v5.patch

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

06 октября 2015 г., 09:12:26

Hello,

At Thu, 01 Oct 2015 01:36:51 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<560C7213.3010203@2ndquadrant.com>
> > Good point. I think we may simply point indexrinfos to the existing
> > list
> > of restrictions in that case - we don't need to copy it. So no
> > additional memory / CPU consumption, and it should make the code
> > working
> > with it a bit simpler.
> 
> Attached is v5 of the patch improving the comments (rephrasing, moving
> before function etc.).

Looks good (for me).

> I also attempted to fix the issue with empty list for non-partial
> indexes by simply setting it to rel->baserestrictinfo in
> match_restriction_clauses_to_index (for non-partial indexes),

Although it is not the cause of these errors (or any errors on
regress), build_paths_for_OR (which doesn't seem to be called in
regression tests) doesn't set indexrinfos
properly. match_clauses_to_index can commonize(?) the process in
return for additional list copy and concat.  The attached diff is
doing in the latter method. Avoiding the copies will make the
code a bit complicated.

But anyway regtests doesn't say whether it is sane or not:( (TODO)

> but that
> results in a bunch of
> 
>    ERROR:  variable not found in subplan target list
> 
> during "make installcheck". I can't quite wrap my head around why.

During considerartion on parameterized joins in
get_join_index_paths, caluseset fed to get_index_paths is
generated from given join (j|e)clausesets. So completing the
clauseset with necessary restrictinfo from rcaluseset fixes the
problem.

The attached patch applies on the latest v5 patch and will
address above issues. (And modifies expected files, which are the
manifestation of this improovement).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c5c2b6e..105351f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -662,6 +662,13 @@ get_join_index_paths(PlannerInfo *root, RelOptInfo *rel,    /* We should have found something,
elsecaller passed silly relids */    Assert(clauseset.nonempty);
 
+    /*
+     * get_index_paths requires that restririctinfo for the index is stored in
+     * clauseset->indexrinfos. Since it doesn't contain join clauses, just
+     * copying that of rclauseset is enough.
+     */
+    clauseset.indexrinfos = rclauseset->indexrinfos;
+    /* Build index path(s) using the collected set of clauses */    get_index_paths(root, rel, index, &clauseset,
bitindexpaths);
@@ -867,7 +874,6 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,    double        loop_count;    List
*orderbyclauses;   List       *orderbyclausecols;
 
-    List       *restrictinfo;    List       *index_pathkeys;    List       *useful_pathkeys;    bool
found_lower_saop_clause;
@@ -1015,16 +1021,13 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,        orderbyclausecols = NIL;    }
-    restrictinfo
-        = (index->indpred != NIL) ? clauses->indexrinfos : rel->baserestrictinfo;
-    /*     * 3. Check if an index-only scan is possible.  If we're not building     * plain indexscans, this isn't
relevantsince bitmap scans don't support     * index data retrieval anyway.     */    index_only_scan = (scantype !=
ST_BITMAPSCAN&&
 
-                       check_index_only(rel, index, restrictinfo));
+                       check_index_only(rel, index, clauses->indexrinfos));    /*     * 4. Generate an indexscan path
ifthere are relevant restriction clauses
 
@@ -1038,7 +1041,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,        ipath = create_index_path(root,
index,                                 index_clauses,                                  clause_columns,
 
-                                  restrictinfo,
+                                  clauses->indexrinfos,                                  orderbyclauses,
                  orderbyclausecols,                                  useful_pathkeys,
 
@@ -1065,7 +1068,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,            ipath = create_index_path(root,
index,                                     index_clauses,                                      clause_columns,
 
-                                      restrictinfo,
+                                      clauses->indexrinfos,                                      NIL,
                   NIL,                                      useful_pathkeys,
 
@@ -2121,6 +2124,11 @@ match_clauses_to_index(IndexOptInfo *index,        Assert(IsA(rinfo, RestrictInfo));
match_clause_to_index(index,rinfo, clauseset);    }
 
+
+    /* copy clauses so that it won't be broken on concatenation afterward */
+    if (index->indpred == NIL)
+        clauseset->indexrinfos =
+            list_concat(clauseset->indexrinfos,    list_copy(clauses));}/*
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index de826b5..88f6a02 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -780,7 +780,6 @@ explain (costs off)                 ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2
                    Index Cond: (f1 IS NOT NULL)                 ->  Index Only Scan using minmaxtest3i on minmaxtest3
 
-                       Index Cond: (f1 IS NOT NULL)   InitPlan 2 (returns $1)     ->  Limit           ->  Merge
Append
@@ -792,8 +791,7 @@ explain (costs off)                 ->  Index Only Scan using minmaxtest2i on minmaxtest2
minmaxtest2_1                      Index Cond: (f1 IS NOT NULL)                 ->  Index Only Scan Backward using
minmaxtest3ion minmaxtest3 minmaxtest3_1
 
-                       Index Cond: (f1 IS NOT NULL)
-(25 rows)
+(23 rows)select min(f1), max(f1) from minmaxtest; min | max 
@@ -819,7 +817,6 @@ explain (costs off)                 ->  Index Only Scan Backward using minmaxtest2i on minmaxtest2
                    Index Cond: (f1 IS NOT NULL)                 ->  Index Only Scan using minmaxtest3i on minmaxtest3
 
-                       Index Cond: (f1 IS NOT NULL)   InitPlan 2 (returns $1)     ->  Limit           ->  Merge
Append
@@ -831,9 +828,8 @@ explain (costs off)                 ->  Index Only Scan using minmaxtest2i on minmaxtest2
minmaxtest2_1                      Index Cond: (f1 IS NOT NULL)                 ->  Index Only Scan Backward using
minmaxtest3ion minmaxtest3 minmaxtest3_1
 
-                       Index Cond: (f1 IS NOT NULL)   ->  Result
-(27 rows)
+(25 rows)select distinct min(f1), max(f1) from minmaxtest; min | max

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

08 октября 2015 г., 08:30:55

Hello,

> The attached patch applies on the latest v5 patch and will
> address above issues. (And modifies expected files, which are the
> manifestation of this improovement).

As you see, it is a quite bad choice. Ugly and unreadable and
fragile.

The cause of this seeming mismatch would be the place to hold
indexrinfos. It is determined only by baserestrictinfo and
indpred. Any other components are not involved. So IndexClauseSet
is found not to be the best place after all, I suppose.

Instead, I came to think that the better place is
IndexOptInfo. Partial indexes are examined in check_partial_index
and it seems to be the most proper place to check this so far.

Thougts?

regards,


At Tue, 06 Oct 2015 15:12:02 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20151006.151202.58051959.horiguchi.kyotaro@lab.ntt.co.jp>
> Hello,
> 
> At Thu, 01 Oct 2015 01:36:51 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<560C7213.3010203@2ndquadrant.com>
> > > Good point. I think we may simply point indexrinfos to the existing
> > > list
> > > of restrictions in that case - we don't need to copy it. So no
> > > additional memory / CPU consumption, and it should make the code
> > > working
> > > with it a bit simpler.
> > 
> > Attached is v5 of the patch improving the comments (rephrasing, moving
> > before function etc.).
> 
> Looks good (for me).
> 
> > I also attempted to fix the issue with empty list for non-partial
> > indexes by simply setting it to rel->baserestrictinfo in
> > match_restriction_clauses_to_index (for non-partial indexes),
> 
> Although it is not the cause of these errors (or any errors on
> regress), build_paths_for_OR (which doesn't seem to be called in
> regression tests) doesn't set indexrinfos
> properly. match_clauses_to_index can commonize(?) the process in
> return for additional list copy and concat.  The attached diff is
> doing in the latter method. Avoiding the copies will make the
> code a bit complicated.
> 
> But anyway regtests doesn't say whether it is sane or not:( (TODO)
> 
> > but that
> > results in a bunch of
> > 
> >    ERROR:  variable not found in subplan target list
> > 
> > during "make installcheck". I can't quite wrap my head around why.
> 
> During considerartion on parameterized joins in
> get_join_index_paths, caluseset fed to get_index_paths is
> generated from given join (j|e)clausesets. So completing the
> clauseset with necessary restrictinfo from rcaluseset fixes the
> problem.
> 
> The attached patch applies on the latest v5 patch and will
> address above issues. (And modifies expected files, which are the
> manifestation of this improovement).

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

08 октября 2015 г., 16:24:45


On 10/08/2015 07:30 AM, Kyotaro HORIGUCHI wrote:
> Hello,
>
>> The attached patch applies on the latest v5 patch and will
>> address above issues. (And modifies expected files, which are the
>> manifestation of this improovement).
>
> As you see, it is a quite bad choice. Ugly and unreadable and
> fragile.

I suppose you mean placing the list into IndexClauseSet?

>
> The cause of this seeming mismatch would be the place to hold
> indexrinfos. It is determined only by baserestrictinfo and
> indpred. Any other components are not involved. So IndexClauseSet
> is found not to be the best place after all, I suppose.
>
> Instead, I came to think that the better place is
> IndexOptInfo. Partial indexes are examined in check_partial_index
> and it seems to be the most proper place to check this so far.

AFAIK there's only one IndexOptInfo instance per index, so I'm not sure 
how would that work with queries that use the index in multiple places? 
Imagine for example table joined to itself, where both sides use the 
index with different conditions.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

09 октября 2015 г., 03:59:59

Hello,

At Thu, 08 Oct 2015 15:24:35 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<56166E93.8000505@2ndquadrant.com>
> 
> 
> On 10/08/2015 07:30 AM, Kyotaro HORIGUCHI wrote:
> > Hello,
> >
> >> The attached patch applies on the latest v5 patch and will
> >> address above issues. (And modifies expected files, which are the
> >> manifestation of this improovement).
> >
> > As you see, it is a quite bad choice. Ugly and unreadable and
> > fragile.
> 
> I suppose you mean placing the list into IndexClauseSet?

No, it is a reasonable choice, It's not bad if it has valid
clauses only for partial indexes.  What is Ugl.. is my additional
patch:(, which let IndexClauseSet to have valid clauses.

> > The cause of this seeming mismatch would be the place to hold
> > indexrinfos. It is determined only by baserestrictinfo and
> > indpred. Any other components are not involved. So IndexClauseSet
> > is found not to be the best place after all, I suppose.
> >
> > Instead, I came to think that the better place is
> > IndexOptInfo. Partial indexes are examined in check_partial_index
> > and it seems to be the most proper place to check this so far.
> 
> AFAIK there's only one IndexOptInfo instance per index, so I'm not
> sure how would that work with queries that use the index in multiple
> places?

No matter if the index is used multiple places, indexrinfos is
determined only with baserestrictinfos of the owner relation and
itself's indpred, which are invariant through the following steps.

One arguable point on the place is that check_partial_indexes
could be called again with longer restrictions (by eclass
claues), but the comment suggests that the last call will be
valid in the following steps so creating indexrinfos in every
calls should be valid.

However possibly a bit innefficient, we can choose to use the
first-created indexrinfos, which would be longer than that could
be re-created.

> Imagine for example table joined to itself, where both sides
> use the index with different conditions.

Such table appears as multiple baserels having maybe different
baserestrictinfo, so no problme on the case.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

09 октября 2015 г., 17:32:41

Hello,

On 10/09/2015 02:59 AM, Kyotaro HORIGUCHI wrote:
>>> The cause of this seeming mismatch would be the place to hold
>>> indexrinfos. It is determined only by baserestrictinfo and
>>> indpred. Any other components are not involved. So IndexClauseSet
>>> is found not to be the best place after all, I suppose.
>>>
>>> Instead, I came to think that the better place is
>>> IndexOptInfo. Partial indexes are examined in check_partial_index
>>> and it seems to be the most proper place to check this so far.
>>
>> AFAIK there's only one IndexOptInfo instance per index, so I'm not
>> sure how would that work with queries that use the index in multiple
>> places?
>
> No matter if the index is used multiple places, indexrinfos is
> determined only with baserestrictinfos of the owner relation and
> itself's indpred, which are invariant through the following steps.

I'm probably missing something, but let's say we have a table like this:

CREATE TABLE t (a INT, b INT, c INT);
CREATE INDEX aidx ON t(c) WHERE a = 1;
CREATE INDEX bidx ON t(c) WHERE b = 2;

and then a trivial query (where each part perfectly matches one of the 
indexes to allow IOS)

SELECT c FROM t WHERE a=1
UNION ALL
SELECT c FROM t WHERE b=2;

Now, let's say we move indexrinfos to IndexOptInfo - how will that look 
like for each index? There's only a single IndexOptInfo for each index, 
so it will have to work with union of all baserestrictinfos.

So we'll have these indexrinfos:

aidx: {b=2}
bidx: {a=1}

which makes index only scans unusable.

I think we effectively need a separate list of "not implied" clauses per 
index-baserel combination. Maybe IndexClauseSet is not the right place, 
but I don't see how IndexOptInfo could work.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

14 октября 2015 г., 09:06:50

Hello,

At Fri, 09 Oct 2015 16:32:31 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<5617CFFF.10606@2ndquadrant.com>
> Hello,
> 
> On 10/09/2015 02:59 AM, Kyotaro HORIGUCHI wrote:
> >>> The cause of this seeming mismatch would be the place to hold
> >>> indexrinfos. It is determined only by baserestrictinfo and
> >>> indpred. Any other components are not involved. So IndexClauseSet
> >>> is found not to be the best place after all, I suppose.
> >>>
> >>> Instead, I came to think that the better place is
> >>> IndexOptInfo. Partial indexes are examined in check_partial_index
> >>> and it seems to be the most proper place to check this so far.
> >>
> >> AFAIK there's only one IndexOptInfo instance per index, so I'm not
> >> sure how would that work with queries that use the index in multiple
> >> places?
> >
> > No matter if the index is used multiple places, indexrinfos is
> > determined only with baserestrictinfos of the owner relation and
> > itself's indpred, which are invariant through the following steps.
> 
> I'm probably missing something, but let's say we have a table like
> this:

You might be missing the fact that a table could represented as
multiple relation(RelOptInfo)s in PlannerInfo or PlannerGlobal.

> CREATE TABLE t (a INT, b INT, c INT);
> CREATE INDEX aidx ON t(c) WHERE a = 1;
> CREATE INDEX bidx ON t(c) WHERE b = 2;
> 
> and then a trivial query (where each part perfectly matches one of the
> indexes to allow IOS)
> 
> SELECT c FROM t WHERE a=1
> UNION ALL
> SELECT c FROM t WHERE b=2;
> 
> Now, let's say we move indexrinfos to IndexOptInfo - how will that
> look like for each index? There's only a single IndexOptInfo for each
> index, so it will have to work with union of all baserestrictinfos.

Needless to say about IndexOptInfo, the two t's in the two
component SELECTS are represented as two different subquery rels
having different baserestrictinfo. So it will correctly be
planned as the following with my previous patch.

Append  (cost=0.12..64.66 rows=20 width=4)  ->  Index Only Scan using aidx on t  (cost=0.12..32.23 rows=10 width=4)  ->
Index Only Scan using bidx on t t_1  (cost=0.12..32.23 rows=10 width=4)
 
(3 rows)

The table t is referred to twice by different (alias) names
(though the diferrence is made by EXPLAIN, it shows that they are
different rels in plantree).

> So we'll have these indexrinfos:
> 
> aidx: {b=2}
> bidx: {a=1}

Yes, but each of them belongs to different rels. So,

> which makes index only scans unusable.

The are usable.

> I think we effectively need a separate list of "not implied" clauses
> per index-baserel combination.
> Maybe IndexClauseSet is not the right
> place, but I don't see how IndexOptInfo could work.

Does this make sense?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

07 декабря 2015 г., 01:49:03

Hello Kyotaro-san,

Sorry for the long delay since your response in this thread :-(

On 10/14/2015 08:06 AM, Kyotaro HORIGUCHI wrote:
>
> The table t is referred to twice by different (alias) names (though
> the diferrence is made by EXPLAIN, it shows that they are different
> rels in plantree).
>
>> So we'll have these indexrinfos:
>>
>> aidx: {b=2}
>> bidx: {a=1}
>
> Yes, but each of them belongs to different rels. So,
>
>> which makes index only scans unusable.
>
> The are usable.

Yes, you're of course right. I got confused by the comment at 
IndexOptInfo that says "per-index information" but as you've pointed out 
it's really "per-index per-reloptinfo" which makes it perfectly suitable 
for keeping the info we need.

However, I'm not sure the changes to check_partial_indexes() are correct 
- particularly the first loop.
  /*   * Frequently, there will be no partial indexes, so first check to   * make sure there's something useful to do
here.  */  have_partial = false;  foreach(lc, rel->indexlist)  {    IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);

    /*     * index rinfos are the same to baseristrict infos for non-partial     * indexes     */    index->indrinfos =
rel->baserestrictinfo;
    if (index->indpred == NIL)      continue;      /* ignore non-partial indexes */
    if (index->predOK)      continue;      /* don't repeat work if already proven OK */
    have_partial = true;    break;  }

Now, imagine we have two indexes - partial and regular. In such case the 
loop will terminate after the first index (assuming predOK=true), so the 
regular index won't have indrinfos set. I think we need to remove the 
'break' at the end.

BTW it took me a while to understand the change at the end:
  /* Search for the first rinfo that is implied by indpred */  foreach (lcr, rel->baserestrictinfo)  {    RestrictInfo
*rinfo= (RestrictInfo *) lfirst(lcr);

    if (predicate_implied_by(list_make1(rinfo->clause),                 index->indpred))      break;  }
  /* This index needs rinfos different from baserestrictinfo */  if (lcr)  {    ... filter implied conditions ...  }

Is there a reason why not to simply move the second block into the if 
block in the foreach loop like this?
  /* Search for the first rinfo that is implied by indpred */  foreach (lcr, rel->baserestrictinfo)  {    RestrictInfo
*rinfo= (RestrictInfo *) lfirst(lcr);

    if (predicate_implied_by(list_make1(rinfo->clause),                 index->indpred))    {      ... filter implied
conditions...      break;    }  }

Otherwise the reworked patch seems fine to me, but I'll give it a bit 
more testing over the next few days.

Thanks for the help so far!

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Michael Paquier

Дата:

24 декабря 2015 г., 06:05:13

On Mon, Dec 7, 2015 at 7:48 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Hello Kyotaro-san,
>
> Sorry for the long delay since your response in this thread :-(
>
> On 10/14/2015 08:06 AM, Kyotaro HORIGUCHI wrote:
>>
>>
>> The table t is referred to twice by different (alias) names (though
>> the diferrence is made by EXPLAIN, it shows that they are different
>> rels in plantree).
>>
>>> So we'll have these indexrinfos:
>>>
>>> aidx: {b=2}
>>> bidx: {a=1}
>>
>>
>> Yes, but each of them belongs to different rels. So,
>>
>>> which makes index only scans unusable.
>>
>>
>> The are usable.
>
>
> Yes, you're of course right. I got confused by the comment at IndexOptInfo
> that says "per-index information" but as you've pointed out it's really
> "per-index per-reloptinfo" which makes it perfectly suitable for keeping the
> info we need.
>
> However, I'm not sure the changes to check_partial_indexes() are correct -
> particularly the first loop.
>
>   /*
>    * Frequently, there will be no partial indexes, so first check to
>    * make sure there's something useful to do here.
>    */
>   have_partial = false;
>   foreach(lc, rel->indexlist)
>   {
>     IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
>
>     /*
>      * index rinfos are the same to baseristrict infos for non-partial
>      * indexes
>      */
>     index->indrinfos = rel->baserestrictinfo;
>
>     if (index->indpred == NIL)
>       continue;      /* ignore non-partial indexes */
>
>     if (index->predOK)
>       continue;      /* don't repeat work if already proven OK */
>
>     have_partial = true;
>     break;
>   }
>
> Now, imagine we have two indexes - partial and regular. In such case the
> loop will terminate after the first index (assuming predOK=true), so the
> regular index won't have indrinfos set. I think we need to remove the
> 'break' at the end.
>
> BTW it took me a while to understand the change at the end:
>
>   /* Search for the first rinfo that is implied by indpred */
>   foreach (lcr, rel->baserestrictinfo)
>   {
>     RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr);
>
>     if (predicate_implied_by(list_make1(rinfo->clause),
>                  index->indpred))
>       break;
>   }
>
>   /* This index needs rinfos different from baserestrictinfo */
>   if (lcr)
>   {
>     ... filter implied conditions ...
>   }
>
> Is there a reason why not to simply move the second block into the if block
> in the foreach loop like this?
>
>   /* Search for the first rinfo that is implied by indpred */
>   foreach (lcr, rel->baserestrictinfo)
>   {
>     RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr);
>
>     if (predicate_implied_by(list_make1(rinfo->clause),
>                  index->indpred))
>     {
>       ... filter implied conditions ...
>       break;
>     }
>   }
>
>
> Otherwise the reworked patch seems fine to me, but I'll give it a bit more
> testing over the next few days.
>
> Thanks for the help so far!

Tomas, are you still working on that? This thread is stalling for 3 weeks.
-- 
Michael

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

24 декабря 2015 г., 06:30:01

Hi,

On 12/24/2015 04:05 AM, Michael Paquier wrote:
> On Mon, Dec 7, 2015 at 7:48 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:

...

>>
>> Otherwise the reworked patch seems fine to me, but I'll give it a bit more
>> testing over the next few days.
>>
>> Thanks for the help so far!
>
> Tomas, are you still working on that? This thread is stalling for 3 weeks.


I haven't discovered anything interesting during the testing, so I guess 
the "needs review" state is appropriate. Let's move the patch to the 
next commitfest.

regards


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Alvaro Herrera

Дата:

31 января 2016 г., 15:34:42

Tomas Vondra wrote:

> On 12/24/2015 04:05 AM, Michael Paquier wrote:

> >Tomas, are you still working on that? This thread is stalling for 3 weeks.
> 
> I haven't discovered anything interesting during the testing, so I guess the
> "needs review" state is appropriate. Let's move the patch to the next
> commitfest.

Not sure what to do here, since this patch got no feedback at all in
this CF.  The right thing to do, ISTM, is to just move it again to the
next CF.  But it'd be really useful if someone can have it a look and
verify at least whether it doesn't need a rebase (requiring a further
submission) so that other people can play with it.  Of course, if
Horiguchi-san or anyone has more review comments, that would be even
better.  

Tomas said he'd do more testing, but we never got a report on whether
anything turned up.

(At this point I'm not sure if either Kyotaro or Tomas should be
considered the patch author ... maybe both?)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Konstantin Knizhnik

Дата:

31 января 2016 г., 18:44:13

I am very interested in this patch because it allows to use partial indexes to ... speed up inserts.
I have implemented "ALTER INDEX ... WHERE ..." construction which allows to change predicate of partial index without
necessityto fully rebuild it.

So it is not necessary to insert new records in index immediately (if new records do not match partial index
conditions).
It can be done later in background (or at night). My experiments show that it allows to increase insert speed five
times(for either partial indexes).

At the same time we do not loose RDBMS requirement that result of query should not depend on presence of indexes. And
itis applicable to all indexes: B-Tree, GIN, GIST,...

But such optimization makes sense only of partial indexes can be used without extra overhead, first of all for
index-onlyscans.

And it is impossible without this patch.

On 01/31/2016 03:34 PM, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>
>> On 12/24/2015 04:05 AM, Michael Paquier wrote:
>>> Tomas, are you still working on that? This thread is stalling for 3 weeks.
>> I haven't discovered anything interesting during the testing, so I guess the
>> "needs review" state is appropriate. Let's move the patch to the next
>> commitfest.
> Not sure what to do here, since this patch got no feedback at all in
> this CF.  The right thing to do, ISTM, is to just move it again to the
> next CF.  But it'd be really useful if someone can have it a look and
> verify at least whether it doesn't need a rebase (requiring a further
> submission) so that other people can play with it.  Of course, if
> Horiguchi-san or anyone has more review comments, that would be even
> better.
>
> Tomas said he'd do more testing, but we never got a report on whether
> anything turned up.
>
> (At this point I'm not sure if either Kyotaro or Tomas should be
> considered the patch author ... maybe both?)
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PATCH: index-only scans with partial indexes

От

Alvaro Herrera

Дата:

01 февраля 2016 г., 01:11:45

Konstantin Knizhnik wrote:
> I am very interested in this patch because it allows to use partial indexes to ... speed up inserts.
> I have implemented "ALTER INDEX ... WHERE ..." construction which allows to change predicate of partial index without
necessityto fully rebuild it.

> So it is not necessary to insert new records in index immediately (if new records do not match partial index
conditions).
> It can be done later in background (or at night). My experiments show that it allows to increase insert speed five
times(for either partial indexes).

> At the same time we do not loose RDBMS requirement that result of query should not depend on presence of indexes. And
itis applicable to all indexes: B-Tree, GIN, GIST,...

> 
> But such optimization makes sense only of partial indexes can be used without extra overhead, first of all for
index-onlyscans.

> And it is impossible without this patch.

That sounds interesting.  So please review this patch and let us know
whether you like it, or whether you have any better ideas for any
particular hunk, or whether you think it should be rewritten from
scratch, or just state that it is perfect.  If you think it's useful,
then it's a good idea to vouch for it to be integrated; and one way of
doing that is making sure it meets the quality standards etc.  If you
don't say anything about the patch, we may never integrate it because we
might have doubts about whether it's worthy.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Konstantin Knizhnik

Дата:

03 февраля 2016 г., 10:37:29

I have applied this patch to our working branch and  during several 
weeks we ran various tests and benchmarks.
We have not noticed any problems or performance degradation.
And at some queries this patch cause very significant increase of 
performance - ten times:

With this patch:

postgres=# explain analyze select count(*) from t where k1<1000000 and 
pk < 1454434742881892;                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=29.65..29.66 rows=1 width=0) (actual 
 
time=0.108..0.108 rows=1 loops=1)   ->  Index Only Scan using idx1 on t  (cost=0.43..27.49 rows=861 
width=0) (actual time=0.012..0.071 rows=963 loops=1)         Index Cond: (k1 < 1000000)         Heap Fetches: 0
Planningtime: 0.100 ms Execution time: 0.121 ms
 
(6 rows)


Without patch:

postgres=# explain analyze select count(*) from t where k1<1000000 and 
pk < 1454434742881892;                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2951.55..2951.56 rows=1 width=0) (actual 
 
time=1.070..1.070 rows=1 loops=1)   ->  Bitmap Heap Scan on t  (cost=19.10..2949.40 rows=861 width=0) 
(actual time=0.161..0.997 rows=963 loops=1)         Recheck Cond: ((k1 < 1000000) AND (pk < 
'1454434742881892'::bigint))         Heap Blocks: exact=954         ->  Bitmap Index Scan on idx1  (cost=0.00..18.88
rows=861
 
width=0) (actual time=0.083..0.083 rows=963 loops=1)               Index Cond: (k1 < 1000000) Planning time: 0.099 ms
Executiontime: 1.089 ms
 
(8 rows)




On 01.02.2016 01:11, Alvaro Herrera wrote:
> Konstantin Knizhnik wrote:
>> I am very interested in this patch because it allows to use partial indexes to ... speed up inserts.
>> I have implemented "ALTER INDEX ... WHERE ..." construction which allows to change predicate of partial index
withoutnecessity to fully rebuild it.
 
>> So it is not necessary to insert new records in index immediately (if new records do not match partial index
conditions).
>> It can be done later in background (or at night). My experiments show that it allows to increase insert speed five
times(for either partial indexes).
 
>> At the same time we do not loose RDBMS requirement that result of query should not depend on presence of indexes.
Andit is applicable to all indexes: B-Tree, GIN, GIST,...
 
>>
>> But such optimization makes sense only of partial indexes can be used without extra overhead, first of all for
index-onlyscans.
 
>> And it is impossible without this patch.
> That sounds interesting.  So please review this patch and let us know
> whether you like it, or whether you have any better ideas for any
> particular hunk, or whether you think it should be rewritten from
> scratch, or just state that it is perfect.  If you think it's useful,
> then it's a good idea to vouch for it to be integrated; and one way of
> doing that is making sure it meets the quality standards etc.  If you
> don't say anything about the patch, we may never integrate it because we
> might have doubts about whether it's worthy.
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

24 февраля 2016 г., 03:13:36

Hi,

On 12/06/2015 11:48 PM, Tomas Vondra wrote:
>    /*
>     * Frequently, there will be no partial indexes, so first check to
>     * make sure there's something useful to do here.
>     */
>    have_partial = false;
>    foreach(lc, rel->indexlist)
>    {
>      IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
>
>      /*
>       * index rinfos are the same to baseristrict infos for non-partial
>       * indexes
>       */
>      index->indrinfos = rel->baserestrictinfo;
>
>      if (index->indpred == NIL)
>        continue;      /* ignore non-partial indexes */
>
>      if (index->predOK)
>        continue;      /* don't repeat work if already proven OK */
>
>      have_partial = true;
>      break;
>    }

Attached is a v6 of the patch, which is actually the version submitted
by Kyotaro-san on 2015/10/8 rebased to current master and with two
additional changes.

Firstly, I've removed the "break" from the initial foreach loop in
check_partial_indexes(). As explained in the previous message, I believe
this was a bug in the patch.

Secondly, I've tried to improve the comments to explain a bit better
what the code is doing.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

partial-index-only-scan-v6.patch

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

25 февраля 2016 г., 13:57:27

Hello, Tomas. my cerebral cortext gets squeezed by jumping from
parser to planner.

At Wed, 24 Feb 2016 01:13:22 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<56CCF5A2.5040702@2ndquadrant.com>
> Hi,
> 
> On 12/06/2015 11:48 PM, Tomas Vondra wrote:
> >    /*
> >     * Frequently, there will be no partial indexes, so first check to
> >     * make sure there's something useful to do here.
> >     */
> >    have_partial = false;
> >    foreach(lc, rel->indexlist)
> >    {
> >      IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
> >
> >      /*
> >       * index rinfos are the same to baseristrict infos for non-partial
> >       * indexes
> >       */
> >      index->indrinfos = rel->baserestrictinfo;
> >
> >      if (index->indpred == NIL)
> >        continue;      /* ignore non-partial indexes */
> >
> >      if (index->predOK)
> >        continue;      /* don't repeat work if already proven OK */
> >
> >      have_partial = true;
> >      break;
> >    }
> 
> Attached is a v6 of the patch, which is actually the version submitted
> by Kyotaro-san on 2015/10/8 rebased to current master and with two
> additional changes.

This relies on the fact I believe that no indexrelinfos are
shared among any two baserels. Are you OK with that?

> Firstly, I've removed the "break" from the initial foreach loop in
> check_partial_indexes(). As explained in the previous message, I
> believe this was a bug in the patch.

I agreed. The break is surely a bug. (the regression didn't find
it though..)

> Secondly, I've tried to improve the comments to explain a bit better
> what the code is doing.

In check_partial_indexes, the following commnet.

>   /*
>    * Update restrictinfo for this index by excluding clauses that
>    * are implied by the index predicates. We first quickly check if
>    * there are any implied clauses, and if we found at least one
>    * we do the actual work. This way we don't need the construct
>    * a copy of the list unless actually needed.
>    */

Is "need the construct" a mistake of "need to construct" ?


match_restriction_clauses_to_index is called only from
create_index_paths, and now the former doesn't use its first
parameter rel. We can safely remove the useless parameter.

-  match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo *index,
-                                     IndexClauseSet *clauseset)
+  match_restriction_clauses_to_index(IndexOptInfo *index,
+                                     IndexClauseSet *clauseset)

I find no other problem and have no objection to this. May I mark
this "Ready for committer" after fixing them?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

25 февраля 2016 г., 14:22:54

Hi,

On 02/25/2016 11:56 AM, Kyotaro HORIGUCHI wrote:
> Hello, Tomas. my cerebral cortext gets squeezed by jumping from
> parser to planner.

LOL

>
> At Wed, 24 Feb 2016 01:13:22 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<56CCF5A2.5040702@2ndquadrant.com>
>> Hi,
>>
>> On 12/06/2015 11:48 PM, Tomas Vondra wrote:
>>>     /*
>>>      * Frequently, there will be no partial indexes, so first check to
>>>      * make sure there's something useful to do here.
>>>      */
>>>     have_partial = false;
>>>     foreach(lc, rel->indexlist)
>>>     {
>>>       IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
>>>
>>>       /*
>>>        * index rinfos are the same to baseristrict infos for non-partial
>>>        * indexes
>>>        */
>>>       index->indrinfos = rel->baserestrictinfo;
>>>
>>>       if (index->indpred == NIL)
>>>         continue;      /* ignore non-partial indexes */
>>>
>>>       if (index->predOK)
>>>         continue;      /* don't repeat work if already proven OK */
>>>
>>>       have_partial = true;
>>>       break;
>>>     }
>>
>> Attached is a v6 of the patch, which is actually the version
>> submitted by Kyotaro-san on 2015/10/8 rebased to current master and
>> with two additional changes.
>
> This relies on the fact I believe that no indexrelinfos are
> shared among any two baserels. Are you OK with that?

I'm not sure what this refers to? You mean the fact that an index will 
have one IndexOptInfo instance for each baserel? Yes, that seems fine to me.

>
>> Firstly, I've removed the "break" from the initial foreach loop in
>> check_partial_indexes(). As explained in the previous message, I
>> believe this was a bug in the patch.
>
> I agreed. The break is surely a bug. (the regression didn't find
> it though..)
>
>> Secondly, I've tried to improve the comments to explain a bit better
>> what the code is doing.
>
> In check_partial_indexes, the following commnet.
>
>>    /*
>>     * Update restrictinfo for this index by excluding clauses that
>>     * are implied by the index predicates. We first quickly check if
>>     * there are any implied clauses, and if we found at least one
>>     * we do the actual work. This way we don't need the construct
>>     * a copy of the list unless actually needed.
>>     */
>
> Is "need the construct" a mistake of "need to construct" ?
>
>
> match_restriction_clauses_to_index is called only from
> create_index_paths, and now the former doesn't use its first
> parameter rel. We can safely remove the useless parameter.
>
> -  match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo *index,
> -                                     IndexClauseSet *clauseset)
> +  match_restriction_clauses_to_index(IndexOptInfo *index,
> +                                     IndexClauseSet *clauseset)
>
> I find no other problem and have no objection to this. May I mark
> this "Ready for committer" after fixing them?

OK. Do you want me to do the fixes, or will you do that?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

26 февраля 2016 г., 09:02:23

Hi,

At Thu, 25 Feb 2016 12:22:45 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<56CEE405.30108@2ndquadrant.com>
> >> Attached is a v6 of the patch, which is actually the version
> >> submitted by Kyotaro-san on 2015/10/8 rebased to current master and
> >> with two additional changes.
> >
> > This relies on the fact I believe that no indexrelinfos are
> > shared among any two baserels. Are you OK with that?
> 
> I'm not sure what this refers to? You mean the fact that an index will
> have one IndexOptInfo instance for each baserel? Yes, that seems fine
> to me.

Yes that what I meant.

> >> Firstly, I've removed the "break" from the initial foreach loop in
> >> check_partial_indexes(). As explained in the previous message, I
> >> believe this was a bug in the patch.
> >
> > I agreed. The break is surely a bug. (the regression didn't find
> > it though..)
> >
> >> Secondly, I've tried to improve the comments to explain a bit better
> >> what the code is doing.
> >
> > In check_partial_indexes, the following commnet.
> >
> >>    /*
> >>     * Update restrictinfo for this index by excluding clauses that
> >>     * are implied by the index predicates. We first quickly check if
> >>     * there are any implied clauses, and if we found at least one
> >>     * we do the actual work. This way we don't need the construct
> >>     * a copy of the list unless actually needed.
> >>     */
> >
> > Is "need the construct" a mistake of "need to construct" ?
> >
> >
> > match_restriction_clauses_to_index is called only from
> > create_index_paths, and now the former doesn't use its first
> > parameter rel. We can safely remove the useless parameter.
> >
> > -  match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo
> > -  *index,
> > -                                     IndexClauseSet *clauseset)
> > +  match_restriction_clauses_to_index(IndexOptInfo *index,
> > +                                     IndexClauseSet *clauseset)
> >
> > I find no other problem and have no objection to this. May I mark
> > this "Ready for committer" after fixing them?
> 
> OK. Do you want me to do the fixes, or will you do that?

Ah. I will do. Please wait a minute.

regares,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

26 февраля 2016 г., 10:07:11

Hello, This is a (maybe) committer-ready patch of a Tomas
Vondra's project.

This patch applies on the current master and passes make check.

This is to exclude some base-estrict clauses that are implied by
index predicates on index scans on partial indexes.

First, this patch adds a new member indexrinfos to IndexOptInfo,
which usually has the same value with baserestrictinfo of the
parent RelOptInfo and cost_index() uses this instead of
RelOptInfo.baserestrictinfo.

For partial indexes, the clauses implied by the index predicates
are removed from the indexrinfos, so that the result plan for
scans on such indexes won't contain such restrictions. Such
restrictions commonly become filter quals that are nothing but a
useless work to do, so this will improve the performance of some
index scans on partial indexes.

The largest part of the extra cost of the additional work would
be the cost of predicate_implied_by() on all clauses of
baserectrictinfo and indpred of every IndexOptInfos. The extra
work is done in check_partial_indexes() for all index scans on
partial indexes.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

26 февраля 2016 г., 10:18:54

I marked this as "ready for commiter" and tried to add me as the
*second* author. But the CF app forces certain msyterious order
for listed names. Is there any means to arrange the author names
in desired order?

At Fri, 26 Feb 2016 16:06:37 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20160226.160637.65689630.horiguchi.kyotaro@lab.ntt.co.jp>
> Hello, This is a (maybe) committer-ready patch of a Tomas
> Vondra's project.
> 
> This patch applies on the current master and passes make check.
> 
> This is to exclude some base-estrict clauses that are implied by
> index predicates on index scans on partial indexes.
> 
> First, this patch adds a new member indexrinfos to IndexOptInfo,
> which usually has the same value with baserestrictinfo of the
> parent RelOptInfo and cost_index() uses this instead of
> RelOptInfo.baserestrictinfo.
> 
> For partial indexes, the clauses implied by the index predicates
> are removed from the indexrinfos, so that the result plan for
> scans on such indexes won't contain such restrictions. Such
> restrictions commonly become filter quals that are nothing but a
> useless work to do, so this will improve the performance of some
> index scans on partial indexes.
> 
> The largest part of the extra cost of the additional work would
> be the cost of predicate_implied_by() on all clauses of
> baserectrictinfo and indpred of every IndexOptInfos. The extra
> work is done in check_partial_indexes() for all index scans on
> partial indexes.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Michael Paquier

Дата:

26 февраля 2016 г., 15:46:23

On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I marked this as "ready for commiter" and tried to add me as the
> *second* author. But the CF app forces certain msyterious order
> for listed names. Is there any means to arrange the author names
> in desired order?

Those are automatically classified by alphabetical order.
-- 
Michael

Re: PATCH: index-only scans with partial indexes

От

Robert Haas

Дата:

26 февраля 2016 г., 19:09:00

On Fri, Feb 26, 2016 at 6:16 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> I marked this as "ready for commiter" and tried to add me as the
>> *second* author. But the CF app forces certain msyterious order
>> for listed names. Is there any means to arrange the author names
>> in desired order?
>
> Those are automatically classified by alphabetical order.

Doh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: index-only scans with partial indexes

От

Michael Paquier

Дата:

27 февраля 2016 г., 15:49:13

On Sat, Feb 27, 2016 at 1:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Feb 26, 2016 at 6:16 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>> I marked this as "ready for commiter" and tried to add me as the
>>> *second* author. But the CF app forces certain msyterious order
>>> for listed names. Is there any means to arrange the author names
>>> in desired order?
>>
>> Those are automatically classified by alphabetical order.
>
> Doh.

Hm? Not sure I am getting that. My point if that they could appear in
the order they have been entered by the user or the order they have
been saved, in which case one could take advantage of that to define
the a list of authors ordered by the amount of work they did.
-- 
Michael

Re: PATCH: index-only scans with partial indexes

От

Robert Haas

Дата:

27 февраля 2016 г., 17:58:37

On Sat, Feb 27, 2016 at 6:19 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Feb 27, 2016 at 1:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Feb 26, 2016 at 6:16 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI
>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>> I marked this as "ready for commiter" and tried to add me as the
>>>> *second* author. But the CF app forces certain msyterious order
>>>> for listed names. Is there any means to arrange the author names
>>>> in desired order?
>>>
>>> Those are automatically classified by alphabetical order.
>>
>> Doh.
>
> Hm? Not sure I am getting that. My point if that they could appear in
> the order they have been entered by the user or the order they have
> been saved, in which case one could take advantage of that to define
> the a list of authors ordered by the amount of work they did.

Well it seems to me it ought to let you specify the order.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: index-only scans with partial indexes

От

Tom Lane

Дата:

09 марта 2016 г., 02:09:04

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> Hello, This is a (maybe) committer-ready patch of a Tomas
> Vondra's project.

I think this needs quite a bit of work yet. A few comments:

* If we're going to pay the price of identifying implied restriction
conditions in check_partial_indexes(), we should at least recoup some
of that investment by not doing it again in create_indexscan_plan().

* create_indexscan_plan() has this comment about what it's doing: * We can also discard quals that are implied by a
partialindex's * predicate, but only in a plain SELECT; when scanning a target relation * of UPDATE/DELETE/SELECT
FORUPDATE, we must leave such quals in the * plan so that they'll be properly rechecked by EvalPlanQual testing.

I believe that that problem applies for this optimization as well,
and thus that you can only remove implied quals in plain SELECT.
At least, if there's some reason why that problem does *not* apply,
there had darn well better be a comment explaining why it's safe.

* Adding indexrinfos to IndexPath seems unnecessary, since that struct
already has the "index" pointer --- you can just get the field out of the
IndexOptInfo when you need it. If you insist on having the extra field,
this patch is short of the threshold for correctness on adding fields to
paths. It missed _outIndexPath for instance.

* The additional #include in costsize.c has no apparent reason.

* The changes in cost_index() really ought to come with some change
in the associated comments.

* Personally I'd not change the signature of
match_restriction_clauses_to_index; that's just code churn that
may have to get undone someday.

* The block comment in check_index_only needs more thought than this,
as the phrase "The same is true" is now invalid; the "same" it refers
to *isn't* the same anymore.

* I'm not too thrilled with injecting the initialization of
index->indrinfos into the initial loop in check_partial_indexes().
If it stays there, I'd certainly expect the comment ahead of the
loop to be changed to have something to do with reality. But can't
we find some more-appropriate place to initialize it? Like maybe
where the IndexOptInfo is first created? I would not really expect
check_partial_indexes() to have side-effects on non-partial indexes.

* I think the double loop in check_partial_indexes() is too cute by half.
I'd be inclined to just build the replacement list unconditionally while
we do the predicate_implied_by() tests. Those are expensive enough that
saving one lappend per implication-test is a useless optimization,
especially if it requires code as contorted and bug-prone as this.

* The comment added to IndexOptInfo is not very adequate, and not spelled
correctly either. There's a block comment you should be adding a para to
(probably take the text you added for struct IndexPath). And again,
there is more work to do to add a field to such a struct, eg outfuncs.c.
Usually a good way to find all the places to touch is to grep for some of
the existing field names in the struct.

* I don't much care for the field name "indrinfos"; it's neither very
readable nor descriptive. Don't have a better suggestion right now
though.

* Not sure if new regression test cases would be appropriate. The changes
in the existing cases seem a bit unfortunate actually; I'm afraid that
this may be defeating the original intent of those tests.

I'm setting this back to Waiting on Author.
regards, tom lane

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

09 марта 2016 г., 11:30:22

Hello, thank you for the comments. The new v8 patch is attched.

At Tue, 08 Mar 2016 18:08:55 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <21567.1457478535@sss.pgh.pa.us>
> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> > Hello, This is a (maybe) committer-ready patch of a Tomas
> > Vondra's project.
> 
> I think this needs quite a bit of work yet.  A few comments:

Not a few at all.

> * If we're going to pay the price of identifying implied restriction
> conditions in check_partial_indexes(), we should at least recoup some
> of that investment by not doing it again in create_indexscan_plan().

Moved a part of the check from create_indexscan_plan into
check_partial_indexes. 
I noticed that we should avoid to exlude
clauses that contain mutable functions so I added that. But I
don't understand the reason for the following condition to refuse
clause pruning.

| rel->relid == root->parse->resultRelation

> * create_indexscan_plan() has this comment about what it's doing:
>      * We can also discard quals that are implied by a partial index's
>      * predicate, but only in a plain SELECT; when scanning a target relation
>      * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the
>      * plan so that they'll be properly rechecked by EvalPlanQual testing.
> I believe that that problem applies for this optimization as well,
> and thus that you can only remove implied quals in plain SELECT.
> At least, if there's some reason why that problem does *not* apply,
> there had darn well better be a comment explaining why it's safe.

This is done in check_partial_indexes using parse_rowmark.

The problem haven't realized with the previous patch because it
(accidentially) used rel->baserestirictinfo, not
index->indrinfos for scan_clauses in create_scan_plan.

But the way how create_scan_plan gets scan_clauses seems
bad. I haven't have any clean idea to deal with it.

> * Adding indexrinfos to IndexPath seems unnecessary, since that struct
> already has the "index" pointer --- you can just get the field out of the
> IndexOptInfo when you need it.  If you insist on having the extra field,
> this patch is short of the threshold for correctness on adding fields to
> paths.  It missed _outIndexPath for instance.

Sorry for the stupid code. I don't insist to keep it. Removed.

> * The additional #include in costsize.c has no apparent reason.

Thank you for pointing out. Removed.

> * The changes in cost_index() really ought to come with some change
> in the associated comments.

I tried to add a comment but it doesn't seem clear.

> * Personally I'd not change the signature of
> match_restriction_clauses_to_index; that's just code churn that
> may have to get undone someday.

No problem, reverted.

> * The block comment in check_index_only needs more thought than this,
> as the phrase "The same is true" is now invalid; the "same" it refers
> to *isn't* the same anymore.

Maybe I took this "the same" wrongly. Tried to fix it but I'm not
confident on the result.

> * I'm not too thrilled with injecting the initialization of
> index->indrinfos into the initial loop in check_partial_indexes().
> If it stays there, I'd certainly expect the comment ahead of the
> loop to be changed to have something to do with reality.  But can't
> we find some more-appropriate place to initialize it?  Like maybe
> where the IndexOptInfo is first created?  I would not really expect
> check_partial_indexes() to have side-effects on non-partial indexes.

Mmm. That is quote right in general. IndexOptInfo is created in
get_relation_info() but baserestrictinfo has not been fixed at
the point. It is fixed as late as set_append_rel_size, almost
just before set_rel_size, and just before the
check_partial_indexes. But initializing indrinfos as a
side-effect of check_partial_indexes is not good as you pointed.
But it is called in two ways, set_tablesample_rel_size and
set_plain_rel_size. So the only possible position of that other
than check_partial_indexes is set_rel_size.

> * I think the double loop in check_partial_indexes() is too cute by half.
> I'd be inclined to just build the replacement list unconditionally while
> we do the predicate_implied_by() tests.  Those are expensive enough that
> saving one lappend per implication-test is a useless optimization,
> especially if it requires code as contorted and bug-prone as this.

Ok, I removed the too cute part and added comment mentioning the
reason for the unconditional replacement.

> * The comment added to IndexOptInfo is not very adequate, and not spelled
> correctly either.  There's a block comment you should be adding a para to
> (probably take the text you added for struct IndexPath).  

I understand that you are mentioning here.

+  List     *indrinfos;    /* baseristrict info which are not implied by
+                           * indpred */

I rewritten to make sense, maybe.

> And again,
> there is more work to do to add a field to such a struct, eg outfuncs.c.
> Usually a good way to find all the places to touch is to grep for some of
> the existing field names in the struct.

Sorry, I just forgot of that. (In spite that I myself give such
kind of comments..) Yeah, I love find-grep on emacs.

By the way, I found this comment in copyfuncs.c but I couldn't
find the "subsidiary structs".

|  * We don't support copying RelOptInfo, IndexOptInfo, or Path nodes.
|  * There are some subsidiary structs that are useful to copy, though.

Finally, all I added for this was one line in _outIndexOptInfo.

> * I don't much care for the field name "indrinfos"; it's neither very
> readable nor descriptive.  Don't have a better suggestion right now
> though.

I agree with you. I didn't like the name so I rethought that. I
followed the seeming rule that prefixing with 'ind' to the field
name, but it is not for index, but for the parent relation. So I
renamed it as "baserestrictinfo" in this version.

> * Not sure if new regression test cases would be appropriate.  The changes
> in the existing cases seem a bit unfortunate actually; I'm afraid that
> this may be defeating the original intent of those tests.

Only aggregates.out is modifed in this patch. The comment for the
test says that,

> --
> -- Test cases that should be optimized into indexscans instead of
> -- the generic aggregate implementation.
...
> -- try it on an inheritance tree
...
> explain (costs off)
>   select min(f1), max(f1) from minmaxtest;

and 

> -- DISTINCT doesn't do anything useful here, but it shouldn't fail
> explain (costs off)
>   select distinct min(f1), max(f1) from minmaxtest;

Utterly no problem from the point of the comment. Although this
patch removes "Index Cond"s for the index minmaxtest3i, it is
simplly caused by a index predicate on the index, which is the
very result of this patch.

> I'm setting this back to Waiting on Author.

Attached the new version v8.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

09 марта 2016 г., 11:35:29

Sorry, I should correct one point.

At Wed, 09 Mar 2016 17:29:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20160309.172949.84135555.horiguchi.kyotaro@lab.ntt.co.jp>
> Hello, thank you for the comments. The new v8 patch is attched.
> 
> At Tue, 08 Mar 2016 18:08:55 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <21567.1457478535@sss.pgh.pa.us>
> > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> > > Hello, This is a (maybe) committer-ready patch of a Tomas
> > > Vondra's project.
> > 
> > I think this needs quite a bit of work yet.  A few comments:
> 
> Not a few at all.
> 
> > * If we're going to pay the price of identifying implied restriction
> > conditions in check_partial_indexes(), we should at least recoup some
> > of that investment by not doing it again in create_indexscan_plan().
> 
> Moved a part of the check from create_indexscan_plan into
> check_partial_indexes. 
> 
>  I noticed that we should avoid to exlude
> clauses that contain mutable functions so I added that. But I
> don't understand the reason for the following condition to refuse
> clause pruning.
> 
> | rel->relid == root->parse->resultRelation
> 
> 
> > * create_indexscan_plan() has this comment about what it's doing:
> >      * We can also discard quals that are implied by a partial index's
> >      * predicate, but only in a plain SELECT; when scanning a target relation
> >      * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the
> >      * plan so that they'll be properly rechecked by EvalPlanQual testing.
> > I believe that that problem applies for this optimization as well,
> > and thus that you can only remove implied quals in plain SELECT.
> > At least, if there's some reason why that problem does *not* apply,
> > there had darn well better be a comment explaining why it's safe.
> 
> This is done in check_partial_indexes using parse_rowmark.

No, using plan_rowmark. It is called for expanded inhertance.

> The problem haven't realized with the previous patch because it
> (accidentially) used rel->baserestirictinfo, not
> index->indrinfos for scan_clauses in create_scan_plan.
> 
> But the way how create_scan_plan gets scan_clauses seems
> bad. I haven't have any clean idea to deal with it.
> 
> > * Adding indexrinfos to IndexPath seems unnecessary, since that struct
> > already has the "index" pointer --- you can just get the field out of the
> > IndexOptInfo when you need it.  If you insist on having the extra field,
> > this patch is short of the threshold for correctness on adding fields to
> > paths.  It missed _outIndexPath for instance.
> 
> Sorry for the stupid code. I don't insist to keep it. Removed.
> 
> > * The additional #include in costsize.c has no apparent reason.
> 
> Thank you for pointing out. Removed.
> 
> > * The changes in cost_index() really ought to come with some change
> > in the associated comments.
> 
> I tried to add a comment but it doesn't seem clear.
> 
> > * Personally I'd not change the signature of
> > match_restriction_clauses_to_index; that's just code churn that
> > may have to get undone someday.
> 
> No problem, reverted.
> 
> > * The block comment in check_index_only needs more thought than this,
> > as the phrase "The same is true" is now invalid; the "same" it refers
> > to *isn't* the same anymore.
> 
> Maybe I took this "the same" wrongly. Tried to fix it but I'm not
> confident on the result.
> 
> > * I'm not too thrilled with injecting the initialization of
> > index->indrinfos into the initial loop in check_partial_indexes().
> > If it stays there, I'd certainly expect the comment ahead of the
> > loop to be changed to have something to do with reality.  But can't
> > we find some more-appropriate place to initialize it?  Like maybe
> > where the IndexOptInfo is first created?  I would not really expect
> > check_partial_indexes() to have side-effects on non-partial indexes.
> 
> Mmm. That is quote right in general. IndexOptInfo is created in
> get_relation_info() but baserestrictinfo has not been fixed at
> the point. It is fixed as late as set_append_rel_size, almost
> just before set_rel_size, and just before the
> check_partial_indexes. But initializing indrinfos as a
> side-effect of check_partial_indexes is not good as you pointed.
> But it is called in two ways, set_tablesample_rel_size and
> set_plain_rel_size. So the only possible position of that other
> than check_partial_indexes is set_rel_size.
> 
> 
> > * I think the double loop in check_partial_indexes() is too cute by half.
> > I'd be inclined to just build the replacement list unconditionally while
> > we do the predicate_implied_by() tests.  Those are expensive enough that
> > saving one lappend per implication-test is a useless optimization,
> > especially if it requires code as contorted and bug-prone as this.
> 
> Ok, I removed the too cute part and added comment mentioning the
> reason for the unconditional replacement.
> 
> > * The comment added to IndexOptInfo is not very adequate, and not spelled
> > correctly either.  There's a block comment you should be adding a para to
> > (probably take the text you added for struct IndexPath).  
> 
> I understand that you are mentioning here.
> 
> +  List     *indrinfos;    /* baseristrict info which are not implied by
> +                           * indpred */
> 
> I rewritten to make sense, maybe.
> 
> > And again,
> > there is more work to do to add a field to such a struct, eg outfuncs.c.
> > Usually a good way to find all the places to touch is to grep for some of
> > the existing field names in the struct.
> 
> Sorry, I just forgot of that. (In spite that I myself give such
> kind of comments..) Yeah, I love find-grep on emacs.
> 
> By the way, I found this comment in copyfuncs.c but I couldn't
> find the "subsidiary structs".
> 
> |  * We don't support copying RelOptInfo, IndexOptInfo, or Path nodes.
> |  * There are some subsidiary structs that are useful to copy, though.
> 
> Finally, all I added for this was one line in _outIndexOptInfo.
> 
> > * I don't much care for the field name "indrinfos"; it's neither very
> > readable nor descriptive.  Don't have a better suggestion right now
> > though.
> 
> I agree with you. I didn't like the name so I rethought that. I
> followed the seeming rule that prefixing with 'ind' to the field
> name, but it is not for index, but for the parent relation. So I
> renamed it as "baserestrictinfo" in this version.
> 
> > * Not sure if new regression test cases would be appropriate.  The changes
> > in the existing cases seem a bit unfortunate actually; I'm afraid that
> > this may be defeating the original intent of those tests.
> 
> Only aggregates.out is modifed in this patch. The comment for the
> test says that,
> 
> > --
> > -- Test cases that should be optimized into indexscans instead of
> > -- the generic aggregate implementation.
> ...
> > -- try it on an inheritance tree
> ...
> > explain (costs off)
> >   select min(f1), max(f1) from minmaxtest;
> 
> and 
> 
> > -- DISTINCT doesn't do anything useful here, but it shouldn't fail
> > explain (costs off)
> >   select distinct min(f1), max(f1) from minmaxtest;
> 
> Utterly no problem from the point of the comment. Although this
> patch removes "Index Cond"s for the index minmaxtest3i, it is
> simplly caused by a index predicate on the index, which is the
> very result of this patch.
> 
> > I'm setting this back to Waiting on Author.
> 
> Attached the new version v8.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

David Steele

Дата:

16 марта 2016 г., 19:13:11

On 3/9/16 3:29 AM, Kyotaro HORIGUCHI wrote:

> Hello, thank you for the comments. The new v8 patch is attched.

As far as I can see this patch should be marked "ready for review" so
now it is.

Thanks,
-- 
-David
david@pgmasters.net

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

17 марта 2016 г., 11:19:11

Thank for changing status.

At Wed, 16 Mar 2016 12:13:07 -0400, David Steele <david@pgmasters.net> wrote in <56E98613.5000003@pgmasters.net>
> On 3/9/16 3:29 AM, Kyotaro HORIGUCHI wrote:
> 
> > Hello, thank you for the comments. The new v8 patch is attched.
> 
> As far as I can see this patch should be marked "ready for review" so
> now it is.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

20 марта 2016 г., 19:00:18

Hi,

On 03/09/2016 09:29 AM, Kyotaro HORIGUCHI wrote:
> Hello, thank you for the comments. The new v8 patch is attched.

I've looked at v8, and I do have a few minor comments:

1) indxpath.c uses get_plan_rowmark without including optimizer/prep.h
so the compiler complains about missing prototype etc.

2) check_partial_indexes should probably add back the 'break' at the end
of the initial loop, as index->baserestrictinfo is not initialized elsewhere

3) match_restriction_clauses_to_index does not need the 'rel' parameter
anymore, so we can remove it

Attached is v9 of the patch addressing those points, and also tweaking
few comments - either fixing typos, or perhaps making them a bit more
clear. And also whitespace/formatting on a few places.

FWIW if this patch gets committed, I think it's fair to list Kyotaro-san
as the primary author and myself as a secondary one.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

partial-index-only-scan-v9.patch

Re: PATCH: index-only scans with partial indexes

От

Kevin Grittner

Дата:

29 марта 2016 г., 21:32:41

I tried to whip this into shape, but there were a few areas I
didn't feel I had the necessary understanding to feel comfortable
taking on the committer role for it.  I've cleaned it up the best I
could, fixing whitespace and typos, eliminating an unnecessary
addition of an include, improving C comments (at least to my eye),
and adding an Assert where it seemed like a good idea to me.

My own tests and those of others show performance improvements (up
to 10x for some tests) and no significant performance degradation,
even with attempts to create "worst case" scenarios.

The only changes to the regression tests are to change an "out"
file to eliminate re-checking the index predicate against the heap
on every matching row, which seems like a good thing.

I'm taking my name off as committer and marking it "Ready for
Committer".  If someone else wants to comment on the issues where
Tom and Kyotaro-san still seem unsatisfied to the point where I
can get my head around it, I could maybe take it back on as
committer -- if anyone feels that could be a net win.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

partial-index-only-scan-v10.patch

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

30 марта 2016 г., 07:01:31

Thank you for polishing this.

At Tue, 29 Mar 2016 13:31:19 -0500, Kevin Grittner <kgrittn@gmail.com> wrote in
<CACjxUsNm9+tn0Hat0p4wR+sF0bfQDYMPLOEw7FyDycQ2UUrbyA@mail.gmail.com>
> I tried to whip this into shape, but there were a few areas I
> didn't feel I had the necessary understanding to feel comfortable
> taking on the committer role for it.  I've cleaned it up the best I
> could, fixing whitespace and typos, eliminating an unnecessary
> addition of an include, improving C comments (at least to my eye),
> and adding an Assert where it seemed like a good idea to me.

Especially for this one,

===
@@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)            continue;            /*
don'trepeat work if already proven OK */        have_partial = true;
 
+        break;    }    if (!have_partial)        return;
===

The initialization has been moved to set_rel_size so the break
can be restored. 


> My own tests and those of others show performance improvements (up
> to 10x for some tests) and no significant performance degradation,
> even with attempts to create "worst case" scenarios.
> 
> The only changes to the regression tests are to change an "out"
> file to eliminate re-checking the index predicate against the heap
> on every matching row, which seems like a good thing.
> 
> I'm taking my name off as committer and marking it "Ready for
> Committer".  If someone else wants to comment on the issues where
> Tom and Kyotaro-san still seem unsatisfied to the point where I
> can get my head around it, I could maybe take it back on as
> committer -- if anyone feels that could be a net win.

FWIW, as mentioned upthread, I added the following condition to
decline ripping index predicates from base restrictinfo without
understanding the reason to do so.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

30 марта 2016 г., 16:40:33

Hi,

On 03/30/2016 06:01 AM, Kyotaro HORIGUCHI wrote:
> Thank you for polishing this.
>
> At Tue, 29 Mar 2016 13:31:19 -0500, Kevin Grittner <kgrittn@gmail.com> wrote in
<CACjxUsNm9+tn0Hat0p4wR+sF0bfQDYMPLOEw7FyDycQ2UUrbyA@mail.gmail.com>
>> I tried to whip this into shape, but there were a few areas I
>> didn't feel I had the necessary understanding to feel comfortable
>> taking on the committer role for it.  I've cleaned it up the best I
>> could, fixing whitespace and typos, eliminating an unnecessary
>> addition of an include, improving C comments (at least to my eye),
>> and adding an Assert where it seemed like a good idea to me.
>
> Especially for this one,
>
> ===
> @@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)
>              continue;            /* don't repeat work if already proven OK */
>
>          have_partial = true;
> +        break;
>      }
>      if (!have_partial)
>          return;
> ===
>
> The initialization has been moved to set_rel_size so the break
> can be restored.

FWIW the break was restored in the v9 by me.

>
>
>> My own tests and those of others show performance improvements (up
>> to 10x for some tests) and no significant performance degradation,
>> even with attempts to create "worst case" scenarios.
>>
>> The only changes to the regression tests are to change an "out"
>> file to eliminate re-checking the index predicate against the heap
>> on every matching row, which seems like a good thing.
>>
>> I'm taking my name off as committer and marking it "Ready for
>> Committer".  If someone else wants to comment on the issues where
>> Tom and Kyotaro-san still seem unsatisfied to the point where I
>> can get my head around it, I could maybe take it back on as
>> committer -- if anyone feels that could be a net win.
>
> FWIW, as mentioned upthread, I added the following condition to
> decline ripping index predicates from base restrictinfo without
> understanding the reason to do so.

Ummmm, I'm a bit confused. Which condition?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kevin Grittner

Дата:

30 марта 2016 г., 18:49:30

On Wed, Mar 30, 2016 at 8:40 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

>> ===
>> @@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo
>> *rel)
>>                         continue;                       /* don't repeat
>> work if already proven OK */
>>
>>                 have_partial = true;
>> +               break;
>>         }
>>         if (!have_partial)
>>                 return;
>> ===
>>
>> The initialization has been moved to set_rel_size so the break
>> can be restored.
>
> FWIW the break was restored in the v9 by me.

Yeah, I kept that in v10, so the three of us seem to be on the same
page there.

>> FWIW, as mentioned upthread, I added the following condition to
>> decline ripping index predicates from base restrictinfo without
>> understanding the reason to do so.
>
> Ummmm, I'm a bit confused. Which condition?

Yeah, any additional discussion about areas which anyone sees as
open or still needing attention might allow me to get enough
traction to wrap this; I'm having trouble seeing what the pending
issues are where both Tom and Kyotaro-san seemed to be unsatisfied.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: index-only scans with partial indexes

От

Tom Lane

Дата:

31 марта 2016 г., 02:36:13

Kevin Grittner <kgrittn@gmail.com> writes:
> I'm taking my name off as committer and marking it "Ready for
> Committer".  If someone else wants to comment on the issues where
> Tom and Kyotaro-san still seem unsatisfied to the point where I
> can get my head around it, I could maybe take it back on as
> committer -- if anyone feels that could be a net win.

I'll pick it up.  In a quick scan I see some things I don't like, but
nothing insoluble:

* Having plancat.c initialize the per-IndexOptInfo restriction lists seems
fairly bizarre.  I realize that Tomas or Kyotaro probably did that in
response to my complaint about having check_partial_indexes have
side-effects on non-partial indexes, but this isn't an improvement.  For
one thing, it means we will produce an incorrect plan if more restriction
clauses are added to the rel after plancat.c runs, as the comment for
check_partial_indexes contemplates.  (I *think* that comment may be
obsolete, but I'm not sure.)  I think a better answer to that complaint is
to rename check_partial_indexes to something else, and more importantly
adjust its header comment to reflect its expanded responsibilities --- as
the patch I was commenting on at the time failed to do.

* It took me some time to convince myself that the patch doesn't break
generation of plans for EvalPlanQual.  I eventually found where it
skips removal of restrictions for target relations, but that's drastically
undercommented.

* Likewise, there's inadequate documentation of why it doesn't break
bitmap scans, which also have to be able to recheck correctly.

* I'm not sure that we have any regression test cases covering the
above two points, but we should.

* The comments leave a lot to be desired in general, eg there are
repeated failures to update comments only a few lines away from a change.
        regards, tom lane

Re: PATCH: index-only scans with partial indexes

От

Tomas Vondra

Дата:

31 марта 2016 г., 09:43:04

On 03/31/2016 01:36 AM, Tom Lane wrote:
> Kevin Grittner <kgrittn@gmail.com> writes:
>> I'm taking my name off as committer and marking it "Ready for
>> Committer".  If someone else wants to comment on the issues where
>> Tom and Kyotaro-san still seem unsatisfied to the point where I
>> can get my head around it, I could maybe take it back on as
>> committer -- if anyone feels that could be a net win.
>
> I'll pick it up.  In a quick scan I see some things I don't like, but
> nothing insoluble:
>
> * Having plancat.c initialize the per-IndexOptInfo restriction lists seems
> fairly bizarre.  I realize that Tomas or Kyotaro probably did that in
> response to my complaint about having check_partial_indexes have
> side-effects on non-partial indexes, but this isn't an improvement.  For
> one thing, it means we will produce an incorrect plan if more restriction
> clauses are added to the rel after plancat.c runs, as the comment for
> check_partial_indexes contemplates.  (I *think* that comment may be
> obsolete, but I'm not sure.)  I think a better answer to that complaint is
> to rename check_partial_indexes to something else, and more importantly
> adjust its header comment to reflect its expanded responsibilities --- as
> the patch I was commenting on at the time failed to do.
>
> * It took me some time to convince myself that the patch doesn't break
> generation of plans for EvalPlanQual.  I eventually found where it
> skips removal of restrictions for target relations, but that's drastically
> undercommented.
>
> * Likewise, there's inadequate documentation of why it doesn't break
> bitmap scans, which also have to be able to recheck correctly.
>
> * I'm not sure that we have any regression test cases covering the
> above two points, but we should.
>
> * The comments leave a lot to be desired in general, eg there are
> repeated failures to update comments only a few lines away from a change.

Kyotaro,

I may look into fixing those issues early next week, but that's fairly 
close to the freeze. Also, I'll have to look into parts that I'm not 
very familiar with (like the EvalPlanQual), so feel free to address 
those issues ;-)

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

31 марта 2016 г., 12:18:30

Hi,

At Wed, 30 Mar 2016 15:40:24 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<8ad11fae-1cb7-2255-d80c-d1daafb5362f@2ndquadrant.com>
> FWIW the break was restored in the v9 by me.

Yeah, I know it. Sorry for the misleading comment.

> > FWIW, as mentioned upthread, I added the following condition to
> > decline ripping index predicates from base restrictinfo without
> > understanding the reason to do so.
> 
> Ummmm, I'm a bit confused. Which condition?

I sent the messages incomplete. The "condition" mentioned above
is the following.

| rel->relid == root->parse->resultRelation


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

31 марта 2016 г., 15:54:33

Thank you for the comment. The new version is attached.

Some issues has not been addressed but the rest will be addresses
in the next version.

At Thu, 31 Mar 2016 08:42:50 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<90d885f2-e5ce-6668-226f-c817154e4f73@2ndquadrant.com>
> On 03/31/2016 01:36 AM, Tom Lane wrote:
> > Kevin Grittner <kgrittn@gmail.com> writes:
> >> I'm taking my name off as committer and marking it "Ready for
> >> Committer".  If someone else wants to comment on the issues where
> >> Tom and Kyotaro-san still seem unsatisfied to the point where I
> >> can get my head around it, I could maybe take it back on as
> >> committer -- if anyone feels that could be a net win.
> >
> > I'll pick it up.  In a quick scan I see some things I don't like, but
> > nothing insoluble:
> >
> > * Having plancat.c initialize the per-IndexOptInfo restriction lists
> > * seems
> > fairly bizarre.  I realize that Tomas or Kyotaro probably did that in
> > response to my complaint about having check_partial_indexes have
> > side-effects on non-partial indexes, but this isn't an improvement.

I felt the same thing for it. It is for discussion. It is the
earliest point where we can initialize baserestrictinfo of each
IndexOptInfo. And the original location is just after and is the
latest point. Reverted.

> > For one thing, it means we will produce an incorrect plan if
> > more restriction clauses are added to the rel after plancat.c
> > runs, as the comment for check_partial_indexes contemplates.

Mmm. Thank you. I overlooked that. The following code seems
saying related thing.

>         if (index->predOK)
>             continue;            /* don't repeat work if already proven OK */

So, index restrinctioninfo should be calculated even if
index->predOK is already true. But refactroring suggested by the
following comment will fix this.

> > (I *think* that comment may be obsolete, but I'm not sure.)
> > I think a better answer to that complaint is to rename
> > check_partial_indexes to something else, and more importantly
> > adjust its header comment to reflect its expanded
> > responsibilities --- as the patch I was commenting on at the
> > time failed to do.

Ok, I removed the index baserestrictinfo (a bit long to repeat..)
creation out of check_partial_indexes and named
rebuild_index_restrictinfo. (calling for better names..)

This loosens the restriction on the timing to trim an index
restrictinfo. It is now moved to create_index_paths.

> > * It took me some time to convince myself that the patch doesn't break
> > generation of plans for EvalPlanQual.  I eventually found where it
> > skips removal of restrictions for target relations, but that's
> > drastically undercommented.

It is originally in create_indexscan_plan and had no
documentation about EPQ. But I also want such documentation there
so I added it on in rebuild_index_restrictinfo.

> > * Likewise, there's inadequate documentation of why it doesn't break
> > bitmap scans, which also have to be able to recheck correctly.

This trims clauses that implied by index predicate, which
donesn't need recheck even if it is a bitmap scan, I believe. Is
it wrong? The original comment on check_index_only said that,

>      * XXX this is overly conservative for partial indexes, since we will
>      * consider attributes involved in the index predicate as required even
>      * though the predicate won't need to be checked at runtime.  (The same is
>      * true for attributes used only in index quals, if we are certain that
>      * the index is not lossy.)  However, it would be quite expensive to
>      * determine that accurately at this point, so for now we take the easy
>      * way out.

This seems to me that such clauses are safely ignored. But not
for attributes used only in index quals, because of possible
lossy charcter of the index. It seems quite reasonable. This
optimization won't be hold if this comment is wrong.

> > * I'm not sure that we have any regression test cases covering the
> > above two points, but we should.

I agree. Will try to add in the next version, maybe posted tomorrow.

> > * The comments leave a lot to be desired in general, eg there are
> > repeated failures to update comments only a few lines away from a
> > change.

I'll recheck that and fix in the next version.

> Kyotaro,
> 
> I may look into fixing those issues early next week, but that's fairly
> close to the freeze. Also, I'll have to look into parts that I'm not
> very familiar with (like the EvalPlanQual), so feel free to address
> those issues ;-)

Aye sir!

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: PATCH: index-only scans with partial indexes

От

Tom Lane

Дата:

31 марта 2016 г., 21:51:12

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> Thank you for the comment. The new version is attached.

Committed with rather heavy editorialization and a batch of regression
test cases.
        regards, tom lane

Re: PATCH: index-only scans with partial indexes

От

Kyotaro HORIGUCHI

Дата:

01 апреля 2016 г., 05:40:13

At Thu, 31 Mar 2016 14:51:02 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <19589.1459450262@sss.pgh.pa.us>
> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> > Thank you for the comment. The new version is attached.
> 
> Committed with rather heavy editorialization and a batch of regression
> test cases.
> 
>             regards, tom lane

Many thanks for the editorialization (or refactoring), and many
descriptive comments and testing, then committing.

There seems no problem for me of that. The followings are my
reviw and thought on that, FWIW.

======
check_index_predicates:+ * At one time it was possible for this to get re-run after adding more+ * restrictions to the
rel,thus possibly letting us prove more indexes OK.+ * That doesn't happen any more (at least not in the core code's
usage),
!+ * but this code still supports it in case extensions want to mess with the
!+ * baserestrictinfo list.  We assume that adding more restrictions can't make+ * an index not predOK.  We must
recomputeindrestrictinfo each time, though,+ * to make sure any newly-added restrictions get into it if needed.

I didn't imagine that the assumption is for the sake of extensions..

+check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
...
+        index->indrestrictinfo = rel->baserestrictinfo;

Ah. This has been retuened here.

+    is_target_rel = (rel->relid == root->parse->resultRelation ||
+                     get_plan_rowmark(root->rowMarks, rel->relid) != NULL);

This is very descriptive even for me.

-     * We can also discard quals that are implied by a partial index's
-     * predicate, but only in a plain SELECT; when scanning a target relation
-     * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the
-     * plan so that they'll be properly rechecked by EvalPlanQual testing.
-     *

Uggg. I'm sorry that I couldn't find this commnet just above what
I removed.

+     * way because predicate conditions need to be rechecked if the scan
+     * becomes lossy, so they have to be included in bitmapqualorig.

Of couse 'lossy' means 'may contain unqualified data', my brain
must have been in powersaving mode.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: PATCH: index-only scans with partial indexes

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения