Обсуждение: Bad plan when null is in an "in" list

Поиск
Список
Период
Сортировка

Bad plan when null is in an "in" list

От
Csaba Nagy
Дата:
Hi all,

Jumping in directly to the subject, this is what I get:

explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE
bigint_col_2 in (12132131, null, null, null,
null);

QUERY PLAN
-------------------------------------------------------------------
 Seq Scan on big_table  (cost=0.00..2447201.85 rows=448 width=16)
   Filter: ((bigint_col_2 = 12132131::bigint) OR NULL::boolean)
(2 rows)


Compared to:


explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE
bigint_col_2 in (12132131, 123781, 1297839032, 123667123);

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using dom_idx_member_bigint_col_2,
dom_idx_member_bigint_col_2, dom_idx_member_bigint_col_2,
dom_idx_member_bigint_col_2 on big_table  (cost=0.00..6427.28 rows=1789
width=16)
   Index Cond: ((bigint_col_2 = 12132131) OR (bigint_col_2 = 123781) OR
(bigint_col_2 = 1297839032) OR (bigint_col_2 = 123667123))
(2 rows)


big_table has ~ 100 million rows.


Considering that NULL::boolean is always false, I don't quite understand
why the first query is going for a sequential scan, instead of just
ignoring the nulls from the in clause...
We have a lot of queries like this for chunking activities, and they
work too on oracle. We've just migrated a bigger data base to postgres
from oracle, and this is the first thing to slow down our system to a
crawl... we will fix this, but the parser could be smarter I guess.

Cheers,
Csaba.



Re: Bad plan when null is in an "in" list

От
Jaime Casanova
Дата:
On 7/26/05, Csaba Nagy <nagy@ecircle-ag.com> wrote:
> Hi all,
>
> Jumping in directly to the subject, this is what I get:
>
> explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE
> bigint_col_2 in (12132131, null, null, null,
> null);
>
> QUERY PLAN
> -------------------------------------------------------------------
>  Seq Scan on big_table  (cost=0.00..2447201.85 rows=448 width=16)
>   Filter: ((bigint_col_2 = 12132131::bigint) OR NULL::boolean)
> (2 rows)
>
>

this is because null values can't be indexed... or telling other way
the planner will never choose an index for comparing to null

maybe a partial index can be used?


> Compared to:
>
>
> explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE
> bigint_col_2 in (12132131, 123781, 1297839032, 123667123);
>
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using dom_idx_member_bigint_col_2,
> dom_idx_member_bigint_col_2, dom_idx_member_bigint_col_2,
> dom_idx_member_bigint_col_2 on big_table  (cost=0.00..6427.28 rows=1789
> width=16)
>   Index Cond: ((bigint_col_2 = 12132131) OR (bigint_col_2 = 123781) OR
> (bigint_col_2 = 1297839032) OR (bigint_col_2 = 123667123))
> (2 rows)
>
>
> big_table has ~ 100 million rows.
>

there is no nulls here so the index can be used

>
> Considering that NULL::boolean is always false,
>

null::boolean  is null not false. that is because null means 'unknown
value'  not false nor true

template1=# select null::boolean;
 bool
------

(1 fila)


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: Bad plan when null is in an "in" list

От
Tom Lane
Дата:
Jaime Casanova <systemguards@gmail.com> writes:
> On 7/26/05, Csaba Nagy <nagy@ecircle-ag.com> wrote:
>> Seq Scan on big_table  (cost=0.00..2447201.85 rows=448 width=16)
>> Filter: ((bigint_col_2 = 12132131::bigint) OR NULL::boolean)
>>
>> Considering that NULL::boolean is always false,

> null::boolean  is null not false. that is because null means 'unknown
> value'  not false nor true

The reason the planner ends up with this and not just "bigint_col_2 =
12132131::bigint" is that it's using a general-purpose expression
simplifier, and in the general case we have to keep the NULL arm of
the OR because it can affect the result (the OR output will be NULL
not FALSE if the equality is false).

In the context of the top level of a WHERE clause, we could discard the
NULL, and then the OR, since we do not need to distinguish NULL and
FALSE results.  However, as far as I can see doing this would require an
extra pass over the WHERE clause (it can't readily be folded into any of
the existing traversals because those are done by routines that have
other uses where dropping NULLs would be wrong).  I'm unconvinced that
the use-case for this justifies that much overhead ...

            regards, tom lane