Обсуждение: WHERE vs ORDER BY vs LIMIT why not using the correct index?

Поиск
Список
Период
Сортировка

WHERE vs ORDER BY vs LIMIT why not using the correct index?

От
"D. Dante Lorenso"
Дата:
All,

I have a simple query:

   SELECT tcq_id
   FROM queue q
   WHERE q.status = 'D'
   ORDER BY tcq_id ASC
   LIMIT 1;

What I'm trying to do is find all the items in the queue which have a
status of 'D' and then select the oldest item first.  My data is very
skewed such that there are not many records matching the WHERE clause.

   status  my_count
   D              4
   Q              6
   S             20
   P             74
   F           3294
   C         291206

However, when I explain that query, it's doing an index scan on the
'tcq_id' primary key column instead of using the index on 'status':

------------------------------
     EXPLAIN SELECT tcq_id
     FROM queue q
     WHERE q.status = 'D'
     ORDER BY tcq_id ASC
     LIMIT 1;

  Limit  (cost=0.00..40.09 rows=1 width=8)
    ->  Index Scan using queue_pkey on queue q (cost=0.00..59059.80
rows=1473 width=8)
          Filter: (status = 'D'::bpchar)
------------------------------

But then when I take out the ORDER BY clause, PostgreSQL will use a
different (and proper) index:

------------------------------
     EXPLAIN SELECT tcq_id
     FROM transcode_queue q
     WHERE q.status = 'D'
     LIMIT 1;

  Limit  (cost=0.00..3.81 rows=1 width=8)
    ->  Index Scan using queue_idx_status on queue q (cost=0.00..5618.07
rows=1473 width=8)
          Index Cond: (status = 'D'::bpchar)
------------------------------

I don't understand why the ORDER BY condition would be affecting my
WHERE criteria.  Shouldn't the ordering be done after the filter is
first applied?

I'm wanting: "find the 4 rows where status = 'D' then order those by
tcq_id and return the first one."  But postgresql seems to be choosing:
"order all records by the tcq_id then scan them sequentially and find
the first one matching status = 'D'".

How can I influence the planner's decision while keeping my ORDER BY clause?

After furthing testing, maybe it's not the ORDER BY but the LIMIT that
is causing the poor planner choice?  I tried to do this:

------------------------------
     SELECT tcq_id
     FROM (
       SELECT tcq_id
       FROM queue q
       WHERE q.status = 'D'
     ) x
     ORDER BY x.tcq_id ASC
     LIMIT 1;

  Limit  (cost=0.00..40.09 rows=1 width=8)
    ->  Index Scan using queue_pkey on queue q  (cost=0.00..59059.80
rows=1473 width=8)
          Filter: (status = 'D'::bpchar)
------------------------------

But this results in another wrong index choice.  So, I removed the LIMIT
clause and now it does use the right index:

------------------------------
     SELECT tcq_id
     FROM (
       SELECT tcq_id
       FROM queue q
       WHERE q.status = 'D'
     ) x
     ORDER BY x.tcq_id ASC;

  Sort  (cost=4314.36..4318.05 rows=1473 width=8)
    Sort Key: q.tcq_id
    ->  Bitmap Heap Scan on queue q  (cost=35.71..4236.85 rows=1473 width=8)
          Recheck Cond: (status = 'D'::bpchar)
          ->  Bitmap Index Scan on queue_idx_status  (cost=0.00..35.34
rows=1473 width=0)
                Index Cond: (status = 'D'::bpchar)
------------------------------

Can someone shed some insight here and help me understand what's going on?

-- Dante


Re: WHERE vs ORDER BY vs LIMIT why not using the correct index?

От
Tom Lane
Дата:
"D. Dante Lorenso" <dante@lorenso.com> writes:
> I don't understand why the ORDER BY condition would be affecting my
> WHERE criteria.  Shouldn't the ordering be done after the filter is
> first applied?

No, not necessarily.  The problem you've got is that the planner
thinks there are 1473 rows not 4 that have status = 'D', and that
affects the choice of plan.  Have you ANALYZEd this table lately?

You might need to push up the statistics target to get reasonable
stats for the very infrequent status values ...

            regards, tom lane