Обсуждение: Indexes on NULL's and order by ... limit N queries

Поиск
Список
Период
Сортировка

Indexes on NULL's and order by ... limit N queries

От
Maxim Boguk
Дата:
When i read about 8.3 support indexed queries on NULL values (like rubric_id is NULL) i was really happy.

But reality strike again... look like NULL in WHERE don't allow effective using
index on (rubric_id, pos) for queries like:
... WHERE rubric_id IS NULL ORDER BY pos LIMIT 5


Here is some details about my issue (all tests on fresh loaded/analyzed into empty 8.3.5 DB):


mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id=8 order by pos limit 5;
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=1337.02..1337.03 rows=5 width=28) (actual time=27.556..27.575 rows=5 loops=1)
    ->  Sort  (cost=1337.02..1340.77 rows=1501 width=28) (actual time=27.552..27.558 rows=5 loops=1)
          Sort Key: pos
          Sort Method:  top-N heapsort  Memory: 25kB
          ->  Seq Scan on cluster_weight  (cost=0.00..1312.09 rows=1501 width=28) (actual time=0.058..25.008 rows=1501
loops=1)
                Filter: (rubric_id = 8)
  Total runtime: 27.638 ms
(7 rows)

ok so we need index on (rubric_id, pos), lets add it:


mboguk_billing=# CREATE INDEX cluster_weight_2 on cluster_weight(rubric_id, pos);
CREATE INDEX


And try again:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id=8 order by pos limit 5;
                                                                  QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..1.70 rows=5 width=28) (actual time=0.095..0.122 rows=5 loops=1)
    ->  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..509.31 rows=1501 width=28) (actual
time=0.090..0.104rows=5 loops=1) 
          Index Cond: (rubric_id = 8)
  Total runtime: 0.176 ms
(4 rows)

Ok... so now query works as intended...

Lets check are index used on search NULL values:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id IS NULL;
                                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1034.21 rows=26435 width=28) (actual
time=0.053..48.123rows=26435 loops=1) 
    Index Cond: (rubric_id IS NULL)
  Total runtime: 85.210 ms
(3 rows)

Yes it is working...

Now lets try main query over NULL:

mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id IS NULL ORDER BY pos LIMIT
5;
                                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=1473.29..1473.30 rows=5 width=28) (actual time=92.220..92.239 rows=5 loops=1)
    ->  Sort  (cost=1473.29..1539.37 rows=26435 width=28) (actual time=92.216..92.223 rows=5 loops=1)
          Sort Key: pos
          Sort Method:  top-N heapsort  Memory: 25kB
          ->  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1034.21 rows=26435 width=28) (actual
time=0.033..47.333rows=26435 loops=1) 
                Index Cond: (rubric_id IS NULL)
  Total runtime: 92.310 ms
(7 rows)

Ooops... that is surprise...
I wasn't ready see that plan here... and performance difference over 1000.

Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on  "where something
isNULL order by ... limit ..."  
queries.


Thanks for any responses and sorry for not so good English.

--
SY, Maxim Boguk

Re: Indexes on NULL's and order by ... limit N queries

От
Tom Lane
Дата:
Maxim Boguk <mboguk@masterhost.ru> writes:
> Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on  "where
somethingis NULL order by ... limit ..."  
> queries.

There's nothing wrong with the plan; you've just got too many NULLs to
make it worth using the index for that.

            regards, tom lane

Re: Indexes on NULL's and order by ... limit N queries

От
Maxim Boguk
Дата:
Sorry with all my respect to you, you look like wrong.
Here example:

With NULL's:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id IS NULL ORDER BY pos LIMIT
5;
                                                                   QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=1503.75..1503.76 rows=5 width=28) (actual time=93.334..93.353 rows=5 loops=1)
    ->  Sort  (cost=1503.75..1569.84 rows=26435 width=28) (actual time=93.329..93.335 rows=5 loops=1)
          Sort Key: pos
          Sort Method:  top-N heapsort  Memory: 25kB
          ->  Bitmap Heap Scan on cluster_weight  (cost=314.32..1064.67 rows=26435 width=28) (actual time=7.519..48.678
rows=26435loops=1) 
                Recheck Cond: (rubric_id IS NULL)
                ->  Bitmap Index Scan on cluster_weight_2  (cost=0.00..307.72 rows=26435 width=0) (actual
time=7.350..7.350rows=26435 loops=1) 
                      Index Cond: (rubric_id IS NULL)
  Total runtime: 93.433 ms
(9 rows)

Now lets change NULL's to -1
mboguk_billing=# UPDATE cluster_weight set rubric_id=-1 where rubric_id IS NULL;
UPDATE 26435

And ANALYZE
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE

And try same query with -1 instead of NULL:
mboguk_billing=# EXPLAIN ANALYZE SELECT * from cluster_weight where cluster_weight.rubric_id=-1 ORDER BY pos LIMIT 5;
                                                                   QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..0.25 rows=5 width=28) (actual time=0.056..0.080 rows=5 loops=1)
    ->  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1334.41 rows=26435 width=28) (actual
time=0.053..0.065rows=5 loops=1) 
          Index Cond: (rubric_id = (-1))
  Total runtime: 0.133 ms
(4 rows)


And plan become normal. So issue not with too many NULL's in my dataset.


--
SY, Maxim Boguk

Tom Lane wrote:
> Maxim Boguk <mboguk@masterhost.ru> writes:
>> Looks like when indexed search over NULL's added, planner wasn't learned right way use such index on  "where
somethingis NULL order by ... limit ..."  
>> queries.
>
> There's nothing wrong with the plan; you've just got too many NULLs to
> make it worth using the index for that.
>
>             regards, tom lane

Re: Indexes on NULL's and order by ... limit N queries

От
Alvaro Herrera
Дата:
Maxim Boguk wrote:
> Sorry with all my respect to you, you look like wrong.

The difference is that the plan with -1 does not need to sort the
output, because it comes sorted out of the index; so the execution can
be stopped as soon as 5 tuples have come out.  With NULL, that can't be
done.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Indexes on NULL's and order by ... limit N queries

От
Maxim Boguk
Дата:
Alvaro Herrera wrote:
> Maxim Boguk wrote:
>> Sorry with all my respect to you, you look like wrong.
>
> The difference is that the plan with -1 does not need to sort the
> output, because it comes sorted out of the index; so the execution can
> be stopped as soon as 5 tuples have come out.  With NULL, that can't be
> done.

But why? NULL's have some special representation in index which don't work same as normal values?
Eg output with rubric_id is NULL dont come sorted from index?

Really my tests show same behavior of -1 and NULL values:

mboguk_billing=# SELECT pos from cluster_weight where rubric_id=-1 limit 20;
  pos
-----
   20
   20
   25
   40
   40
   50
   60
   60
   80
   80
  100
  120
  140
  160
  180
  200
  220
  240
  260
  280
(20 rows)

mboguk_billing=# UPDATE cluster_weight set rubric_id=NULL where rubric_id=-1;
UPDATE 26435
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE
mboguk_billing=# SELECT pos from cluster_weight where rubric_id is NULL limit 20;
  pos
-----
   20
   20
   25
   40
   40
   50
   60
   60
   80
   80
  100
  120
  140
  160
  180
  200
  220
  240
  260
  280
(20 rows)

Eg output with rubric_id is NULL come ordered be pos from index (rubric_id, pos)

(
Here is explains:

mboguk_billing=# EXPLAIN ANALYZE SELECT pos from cluster_weight where rubric_id is NULL limit 20;
                                                                   QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..0.99 rows=20 width=2) (actual time=0.050..0.144 rows=20 loops=1)
    ->  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1314.94 rows=26435 width=2) (actual
time=0.045..0.082rows=20 loops=1) 
          Index Cond: (rubric_id IS NULL)
  Total runtime: 0.214 ms
(4 rows)

mboguk_billing=# UPDATE cluster_weight set rubric_id=-1 where rubric_id IS NULL;
UPDATE 26435
mboguk_billing=# ANALYZE cluster_weight;
ANALYZE

mboguk_billing=# EXPLAIN ANALYZE SELECT pos from cluster_weight where rubric_id=-1 limit 20;
                                                                   QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..0.95 rows=20 width=2) (actual time=0.050..0.141 rows=20 loops=1)
    ->  Index Scan using cluster_weight_2 on cluster_weight  (cost=0.00..1259.05 rows=26435 width=2) (actual
time=0.045..0.081rows=20 loops=1) 
          Index Cond: (rubric_id = (-1))
  Total runtime: 0.214 ms
(4 rows)

Plans look same.
)

PS: REINDEX do not change situation.

--
SY Maxim Boguk

Re: Indexes on NULL's and order by ... limit N queries

От
Tom Lane
Дата:
Maxim Boguk <mboguk@masterhost.ru> writes:
> But why? NULL's have some special representation in index which don't work same as normal values?

In general, NULLs don't work the same as normal values, no.

The reason this particular query isn't working as you are expecting is
that "foo IS NULL" isn't seen as an ordering constraint by the planner's
pathkey machinery, and so the query doesn't appear to match the index
order.  You could work around it by explicitly specifying a matching
ordering:

    SELECT * from cluster_weight where cluster_weight.rubric_id IS NULL
    ORDER BY rubric_id, pos LIMIT 5;
                 ^^^^^^^^^^


            regards, tom lane