Обсуждение: Poor man's partitioned index .... not being used?

Поиск
Список
Период
Сортировка

Poor man's partitioned index .... not being used?

От
Gunther
Дата:
Hi all, look at this short story please:

foo=# CREATE TABLE Test(id int NOT NULL PRIMARY KEY);
CREATE TABLE
foo=# INSERT INTO test SELECT row_number() OVER() FROM pg_class a CROSS JOIN pg_class b;
INSERT 0 388129
foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';                               QUERY PLAN
---------------------------------------------------------------------------Index Only Scan using test_pkey on test  (cost=0.42..8.44 rows=1 width=4)  Index Cond: (id = 8934)
(2 rows)

foo=# ALTER TABLE Test DROP CONSTRAINT Test_pkey;
ALTER TABLE
foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';                      QUERY PLAN
-------------------------------------------------------Seq Scan on test  (cost=0.00..6569.61 rows=1 width=4)  Filter: (id = 8934)
(2 rows)

foo=# SELECT max(id)/2 FROM Test;?column?
----------  194064
(1 row)

foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE id < 194064;
CREATE INDEX
foo=# CREATE UNIQUE INDEX Test_pk1 ON Test(id) WHERE id >= 194064;
CREATE INDEX
foo=# ANALYZE Test;
ANALYZE
foo=# EXPLAIN SELECT * FROM Test WHERE id = 8934;                               QUERY PLAN
--------------------------------------------------------------------------Index Only Scan using test_pk0 on test  (cost=0.42..8.44 rows=1 width=4)  Index Cond: (id = 8934)
(2 rows)


foo=# DROP INDEX Test_pk0;
DROP INDEX
foo=# DROP INDEX Test_pk1;
DROP INDEX

foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE mod(id,2) = 0;
CREATE INDEX
foo=# CREATE UNIQUE INDEX Test_pk1 ON Test(id) WHERE mod(id,2) = 1;
CREATE INDEX
foo=# ANALYZE Test;
ANALYZE
foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';                     QUERY PLAN
-------------------------------------------------------Seq Scan on test  (cost=0.00..6569.61 rows=1 width=4)  Filter: (id = 8934)
(2 rows)

Why is that index never used?

PS: there is a performance question behind this, big table, heavily used index, 
the hope was that with this simple scheme of partitioning just the index one might
distribute the load better. I know, if the load really is so big, why not partition
the entire table. But just for hecks, why not this way?

regards,
-Gunther

Re: Poor man's partitioned index .... not being used?

От
David Rowley
Дата:
On Thu, 21 Mar 2019 at 15:51, Gunther <raj@gusw.net> wrote:
> foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE mod(id,2) = 0;
> CREATE INDEX
> foo=# CREATE UNIQUE INDEX Test_pk1 ON Test(id) WHERE mod(id,2) = 1;
> CREATE INDEX
> foo=# ANALYZE Test;
> ANALYZE
> foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';
>                   QUERY PLAN
> -------------------------------------------------------
>  Seq Scan on test  (cost=0.00..6569.61 rows=1 width=4)
>    Filter: (id = 8934)
> (2 rows)
>
> Why is that index never used?

When the planner looks at partial indexes to see if they'll suit the
scan, the code that does the matching (predicate_implied_by()) simply
does not go to that much trouble to determine if it matches. If you
look at operator_predicate_proof() you'll see it requires the
expression on at least one side of the OpExpr to match your predicate.
Yours matches on neither side since "id" is wrapped up in a mod()
function call.

Certainly, predicate_implied_by() is by no means finished, new smarts
have been added to it over the years to allow it to prove more cases,
but each time something is added we still need to carefully weigh up
the additional overhead of the new code vs. possible benefits.

It may be possible to do something with immutable functions found in
the expr but someone doing so might have a hard time proving that it's
always safe to do so. For example, arg 2 of your mod() call is a
Const. If it had been another Var then it wouldn't be safe to use.
What other unsafe cases are there? Is there a way we can always
identify unsafe cases during planning? ... are the sorts of questions
someone implementing this would be faced with.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Poor man's partitioned index .... not being used?

От
Andrew Gierth
Дата:
>>>>> "Gunther" == Gunther  <raj@gusw.net> writes:

 Gunther> foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE mod(id,2) = 0;
 Gunther> CREATE INDEX

 Gunther> foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934';
 Gunther>                   QUERY PLAN
 Gunther> -------------------------------------------------------
 Gunther>  Seq Scan on test  (cost=0.00..6569.61 rows=1 width=4)
 Gunther>    Filter: (id = 8934)
 Gunther> (2 rows)

 Gunther> Why is that index never used?

Because the expression mod(id,2) does not appear in the query, and there
is no logic in the implication prover to prove that (mod(id,2) = 0) is
implied by (id = 8934).

If you did  WHERE mod(id,2) = mod(8934,2) AND id = 8934

then the index would likely be used - because the prover can then treat
mod(id,2) as an atom (call it X), constant-fold mod(8934,2) to 0 because
mod() is immutable, and then observe that (X = 0) proves that (X = 0).

Pretty much the only simple implications that the prover can currently
deduce are:

  - identical immutable subexpressions are equivalent

  - strict operator expressions imply scalar IS NOT NULL

  - (A op1 B) implies (B op2 A) if op2 is op1's declared commutator

  - Btree semantics: if <, <=, =, >=, > are all members of a btree
    opfamily, and <> is the declared negator of =, then implications
    like (X < A) and (A <= B) implies (X < B) can be deduced.

-- 
Andrew (irc:RhodiumToad)


Re: Poor man's partitioned index .... not being used?

От
Gunther
Дата:

Thanks David Rowley and Andrew Gierth.

On 3/20/2019 23:46, Andrew Gierth wrote:
If you did  WHERE mod(id,2) = mod(8934,2) AND id = 8934

then the index would likely be used - because the prover can then treat
mod(id,2) as an atom (call it X), constant-fold mod(8934,2) to 0 because
mod() is immutable, and then observe that (X = 0) proves that (X = 0).
foo=# EXPLAIN SELECT * FROM Test WHERE mod(id,2) = mod(8934,2) AND id = 8934;                               QUERY PLAN
--------------------------------------------------------------------------Index Only Scan using test_pk0 on test  (cost=0.42..8.44 rows=1 width=4)  Index Cond: (id = 8934)
(2 rows)

Yes indeed! It's being used that way! Interesting. Only that we can't use it if id was a variable? Hmm ...

foo=# PREPARE testplan(int) AS 
foo-#    SELECT * FROM Test WHERE mod(id,2) = mod($1,2) AND id = $1;
PREPARE
foo=# EXPLAIN EXECUTE testplan(8934);                               QUERY PLAN
--------------------------------------------------------------------------Index Only Scan using test_pk0 on test  (cost=0.42..8.44 rows=1 width=4)  Index Cond: (id = 8934)
(2 rows)

That's quite alright actually. Now the questions is, could we use this in a nested loop query plan? That's where I think it can't work:

foo=# CREATE TABLE Test2 AS SELECT * FROM Test WHERE random() < 0.01 ORDER BY id DESC;
SELECT 3730
integrator=# ANALYZE Test2;
ANALYZE
foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10;                                QUERY PLAN
-----------------------------------------------------------------------------Limit  (cost=110.25..135.67 rows=10 width=8)  ->  Hash Right Join  (cost=110.25..9591.02 rows=3730 width=8)        Hash Cond: ((mod(b.id, 2) = mod(a.id, 2)) AND (b.id = a.id))        ->  Seq Scan on test b  (cost=0.00..5599.29 rows=388129 width=4)        ->  Hash  (cost=54.30..54.30 rows=3730 width=4)              ->  Seq Scan on test2 a  (cost=0.00..54.30 rows=3730 width=4)
(6 rows)

foo=# SET enable_hashjoin TO off;
SET
foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10;                                  QUERY PLAN
--------------------------------------------------------------------------------Limit  (cost=47214.73..47227.86 rows=10 width=8)  ->  Merge Right Join  (cost=47214.73..52113.16 rows=3730 width=8)        Merge Cond: (((mod(b.id, 2)) = (mod(a.id, 2))) AND (b.id = a.id))        ->  Sort  (cost=46939.15..47909.47 rows=388129 width=4)              Sort Key: (mod(b.id, 2)), b.id              ->  Seq Scan on test b  (cost=0.00..5599.29 rows=388129 width=4)        ->  Sort  (cost=275.58..284.91 rows=3730 width=4)              Sort Key: (mod(a.id, 2)), a.id              ->  Seq Scan on test2 a  (cost=0.00..54.30 rows=3730 width=4)
(9 rows)

foo=# SET enable_mergejoin TO off;
SET
foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10;                                  QUERY PLAN
--------------------------------------------------------------------------------Limit  (cost=0.00..102516.78 rows=10 width=8)  ->  Nested Loop Left Join  (cost=0.00..38238760.24 rows=3730 width=8)        Join Filter: ((b.id = a.id) AND (mod(b.id, 2) = mod(a.id, 2)))        ->  Seq Scan on test2 a  (cost=0.00..54.30 rows=3730 width=4)        ->  Materialize  (cost=0.00..9056.93 rows=388129 width=4)              ->  Seq Scan on test b  (cost=0.00..5599.29 rows=388129 width=4)
(6 rows)

It looks like it doesn't want to evaluate the mod(a.id, 2) before it moves to the index query for the nested loop.

Notably the partitioned table approach should do that, but it has a different expression for the partition. No mod function but MODULUS and REMAINDER.

I wonder if there was a way of marking such expressions as safe in the query, like suggesting a certain evaluation order, i.e.,

SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON(mod(b.id,2) = EVAL(mod(a.id,2)) AND b.id = a.id) LIMIT 10;

It's OK though. It just goes to show that in a case like this, it is best to just go with the partitioned table anyway.

regards,
-Gunther

Re: Poor man's partitioned index .... not being used?

От
David Rowley
Дата:
On Fri, 22 Mar 2019 at 07:57, Gunther <raj@gusw.net> wrote:
> foo=# PREPARE testplan(int) AS
> foo-#    SELECT * FROM Test WHERE mod(id,2) = mod($1,2) AND id = $1;
> PREPARE
> foo=# EXPLAIN EXECUTE testplan(8934);
>                                 QUERY PLAN
> --------------------------------------------------------------------------
>  Index Only Scan using test_pk0 on test  (cost=0.42..8.44 rows=1 width=4)
>    Index Cond: (id = 8934)
> (2 rows)
>
> That's quite alright actually. Now the questions is, could we use this in a nested loop query plan? That's where I
thinkit can't work:
 

Not really. In that case, the parameters were replaced with the
specified values (a.k.a custom plan).  That happens for the first 5
executions of a prepared statement, and in this case likely the
planner will continue to use the custom plan since the generic plan
won't know that the partial index is okay to use and the plan costs
would likely go up enough that the custom plan would continue to be
favoured.

> foo=# SET enable_mergejoin TO off;
> SET
> foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10;
>                                    QUERY PLAN
> --------------------------------------------------------------------------------
>  Limit  (cost=0.00..102516.78 rows=10 width=8)
>    ->  Nested Loop Left Join  (cost=0.00..38238760.24 rows=3730 width=8)
>          Join Filter: ((b.id = a.id) AND (mod(b.id, 2) = mod(a.id, 2)))
>          ->  Seq Scan on test2 a  (cost=0.00..54.30 rows=3730 width=4)
>          ->  Materialize  (cost=0.00..9056.93 rows=388129 width=4)
>                ->  Seq Scan on test b  (cost=0.00..5599.29 rows=388129 width=4)
> (6 rows)
>
> It looks like it doesn't want to evaluate the mod(a.id, 2) before it moves to the index query for the nested loop.

Whether partial indexes can be used are not is determined using only
quals that can be applied at the scan level.  In this case your qual
is a join qual, and since no other qual exists that can be evaluated
at the scan level where the index can be used, then it's not
considered.  In any case, nothing there guarantees that one of your
indexes will match all records. For it to work, both of you indexes
would have to be scanned. It's not clear why you think that would be
any better than scanning just one index. I imagine it would only ever
be a win if you could eliminate one of the index scans with some qual
that guarantees that the index can't contain any records matching your
query.

> I wonder if there was a way of marking such expressions as safe in the query, like suggesting a certain evaluation
order,i.e.,
 
>
> SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON(mod(b.id,2) = EVAL(mod(a.id,2)) AND b.id = a.id) LIMIT 10;
>
> It's OK though. It just goes to show that in a case like this, it is best to just go with the partitioned table
anyway.

It sounds like you might want something like partition-wise join that
exists in PG11.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services