Обсуждение: IS NOT DISTINCT FROM statement
I have some query:
EXPLAIN ANALYZE select id from sometable where fkey IS NOT DISTINCT FROM 21580;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Gather (cost=10.00..39465.11 rows=1 width=4) (actual time=0.512..129.625 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on sometable (cost=0.00..39455.01 rows=1 width=4) (actual time=77.995..103.806 rows=0 loops=5)
Filter: (NOT (fkey IS DISTINCT FROM 21580))
Rows Removed by Filter: 675238
Planning time: 0.101 ms
Execution time: 148.517 ms
Other Query:
EXPLAIN ANALYZE select id from table where fkey=21580;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using sometable_index1 on sometable (cost=0.43..8.45 rows=1 width=4) (actual time=0.075..0.076 rows=1 loops=1)
Index Cond: (fkey = 21580)
Planning time: 0.117 ms
Execution time: 0.101 ms
(4 rows)
There is unique index on sometable(fkey);
Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” when value on right side of expression is not NULL or is this any way to use index with „IS NOT DISTINCT FROM” statement?
Artur Zajac
Artur Zając wrote: > Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” when value > on right side of expression is not NULL or is this any way to use index with „IS NOT DISTINCT FROM” statement? That would subtly change the semantics of the expression: test=> SELECT NULL IS NOT DISTINCT FROM 21580; ?column? ---------- f (1 row) test=> SELECT NULL = 21580; ?column? ---------- (1 row) One expression is FALSE, the other NULL. It doesn't matter in the context of your specific query, but it could matter. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Sat, 9 Mar 2019 at 00:30, Artur Zając <azajac@ang.com.pl> wrote: > Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” when value on right side of expression isnot NULL or is this any way to use index with „IS NOT DISTINCT FROM” statement? Probably nothing other than nobody has done it yet. It might be reasonable to have some transformation stage called from distribute_restrictinfo_to_rels() when adding single rel RestrictInfos to RTE_RELATION base rels. It's only these you can check for NOT NULL constraints, i.e. not so possible with rtekinds such as RTE_FUNCTION and the like. It becomes more complex if you consider that someone might have added a partial index on the relation that matches the IS NOT DISTINCT FROM clause. In this case, they might not be happy that their index can no longer be used. Fixing that would require some careful surgery on predicate_implied_by() to teach it about IS NOT DISTINCT FROM clauses. However, that seems to go a step beyond what predicate_implied_by() does for now. Currently, it only gets to know about quals. Not the relations they belong to, so there'd be no way to know that the NOT NULL constraint exists from there. I'm not sure if there's a good reason for this or not, it might be because it's not been required before. It gets more complex still if you want to consider other quals in the list to prove not nullness. In short, probably possible, but why not just write an equality clause, if you know NULLs are not possible? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> In short, probably possible, but why not just write an equality clause, if you know NULLs are not possible? In fact I construct query like this (usually in pl/pgsql). SELECT column FROM table WHERE column1 IS NOT DISTINCT FROM $1 AND column2 IS NOT DISTINCT FROM $2; "IS NOT DISTINCT FROM" statement simplifies the query ($1 OR $2 may be null, col1 and col2 has indexes). I made some workaround. I made function: CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS $BODY$ SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END); $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE; And then explain analyze select id from sometable where smarteq(id1,21580); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Index Scan using sometable_index1 on sometable (cost=0.43..8.45 rows=1 width=4) (actual time=0.085..0.086 rows=1 loops=1) Index Cond: (id1 = 21580) Planning time: 0.223 ms Execution time: 0.117 ms (4 rows) explain analyze select id from sometable where smarteq(id1,NULL); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on sometable (cost=19338.59..57045.02 rows=882343 width=4) (actual time=116.236..306.304 rows=881657 loops=1) Recheck Cond: (id1 IS NULL) Heap Blocks: exact=9581 -> Bitmap Index Scan on sometable_index1 (cost=0.00..19118.00 rows=882343 width=0) (actual time=114.209..114.209 rows=892552loops=1) Index Cond: (id1 IS NULL) Planning time: 0.135 ms Execution time: 339.229 ms It looks like it works, but I must check if it will still works in plpgsql (I expect some problems if query is prepared). Artur Zajac
On Sat, 9 Mar 2019 at 01:25, Artur Zając <azajac@ang.com.pl> wrote: > I made some workaround. I made function: > > CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS > $BODY$ > SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END); > $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE; > explain analyze select id from sometable where smarteq(id1,NULL); > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on sometable (cost=19338.59..57045.02 rows=882343 width=4) (actual time=116.236..306.304 rows=881657loops=1) > Recheck Cond: (id1 IS NULL) > Heap Blocks: exact=9581 > -> Bitmap Index Scan on sometable_index1 (cost=0.00..19118.00 rows=882343 width=0) (actual time=114.209..114.209 rows=892552loops=1) > Index Cond: (id1 IS NULL) > Planning time: 0.135 ms > Execution time: 339.229 ms > > It looks like it works, but I must check if it will still works in plpgsql (I expect some problems if query is prepared). I think with either that you'll just be at the mercy of whether a generic or custom plan is chosen. If you get a custom plan then likely your case statement will be inlined and constant folded away, but for a generic plan, that can't happen since those constants are not consts, they're parameters. Most likely, if you've got an index on the column you'll perhaps always get a custom plan as the generic plan would result in a seqscan and it would have to evaluate your case statement for each row. By default, generic plans are only considered on the 6th query execution and are only chosen if the generic cost is cheaper than the average custom plan cost + fuzz cost for planning. PG12 gives you a bit more control over that with the plan_cache_mode GUC, but... that's the not out yet. However, possibly the cost of planning each execution is cheaper than doing the seq scan, so you might be better off with this. There is a risk that the planner does for some reason choose a generic plan and ends up doing the seq scan, but for that to happen likely the table would have to be small, in which case it wouldn't matter or the costs would have to be off, which might cause you some pain. The transformation mentioned earlier could only work if the arguments of the IS NOT DISTINCT FROM were Vars or Consts. It couldn't work with Params since the values are unknown to the planner. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On Sat, 9 Mar 2019 at 01:25, Artur Zając <azajac@ang.com.pl> wrote: >> CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS >> $BODY$ >> SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END); >> $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE; > The transformation mentioned earlier could only work if the arguments > of the IS NOT DISTINCT FROM were Vars or Consts. It couldn't work with > Params since the values are unknown to the planner. Just looking at this example, I'm wondering if there'd be any value in adding a rule to eval_const_expressions that converts IS DISTINCT FROM with one constant-NULL argument into an IS NOT NULL test on the other argument. Doing anything with the general case would be hard, as you mentioned, but this "workaround" suggests that the OP isn't actually concerned with the general case. [ experiments... ] Oh, look at this: regression=# explain verbose select f1 is distinct from null from int4_tbl; QUERY PLAN --------------------------------------------------------------- Seq Scan on public.int4_tbl (cost=0.00..1.05 rows=5 width=1) Output: (f1 IS NOT NULL) (2 rows) regression=# explain verbose select f1 is not distinct from null from int4_tbl; QUERY PLAN --------------------------------------------------------------- Seq Scan on public.int4_tbl (cost=0.00..1.05 rows=5 width=1) Output: (f1 IS NULL) (2 rows) So somebody already inserted this optimization, but I don't see it happening in eval_const_expressions ... oh, it's way earlier, in transformAExprDistinct: /* * If either input is an undecorated NULL literal, transform to a NullTest * on the other input. That's simpler to process than a full DistinctExpr, * and it avoids needing to require that the datatype have an = operator. */ if (exprIsNullConstant(rexpr)) return make_nulltest_from_distinct(pstate, a, lexpr); if (exprIsNullConstant(lexpr)) return make_nulltest_from_distinct(pstate, a, rexpr); I'm hesitant to call that wrong; the ability to avoid a dependency on an "=" operator is kind of nice. But it doesn't help for cases requiring a Param substitution. So maybe if we *also* had a check for this in eval_const_expressions, that would address the OP's problem. But the use-case would be a bit narrow given that the parser is catching the simplest case. regards, tom lane