Обсуждение: Wrong index used when ORDER BY LIMIT 1

Поиск
Список
Период
Сортировка

Wrong index used when ORDER BY LIMIT 1

От
Szűcs Gábor
Дата:
Dear Gurus,

Version: 7.4.6

I use a query on a heavily indexed table which picks a wrong index
unexpectedly. Since this query is used in response to certain user
interactions thousands of times in succession (with different constants),
500ms is not affordable for us. I can easily work around this, but I'd like
to understand the root of the problem.

Basically, there are two relevant indexes:
- muvelet_vonalkod_muvelet btree (muvelet, ..., idopont)
- muvelet_vonalkod_pk3 btree (idopont, ...)

Query is:
SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.

I expected the planner to choose the index on muvelet, then sort by idopont.
Instead, it took the other index. I think there is heavy correlation since
muvelet references to a sequenced pkey and idopont is a timestamp (both
increase with passing time). May that be a cause?

See full table description and explain analyze results at end of the email.


TIA,
--
G.

---- table :
            Table "public.muvelet_vonalkod"
    Column   |           Type           |  Modifiers
------------+--------------------------+-----------------------------------
  az         | integer                  | not null def. nextval('...')
  olvaso_nev | character varying        | not null
  vonalkod   | character varying        | not null
  mozgasnem  | integer                  | not null
  idopont    | timestamp with time zone | not null
  muvelet    | integer                  |
  minoseg    | integer                  | not null
  cikk       | integer                  |
  muszakhely | integer                  |
  muszakkod  | integer                  |
  muszaknap  | date                     |
  repre      | boolean                  | not null default false
  hiba       | integer                  | not null default 0
Indexes:
     "muvelet_vonalkod_pkey" primary key, btree (az)
     "muvelet_vonalkod_pk2" unique, btree (olvaso_nev, idopont)
     "muvelet_vonalkod_muvelet" btree
         (muvelet, mozgasnem, vonalkod, olvaso_nev, idopont)
     "muvelet_vonalkod_pk3" btree (idopont, olvaso_nev)
     "muvelet_vonalkod_vonalkod" btree
         (vonalkod, mozgasnem, olvaso_nev, idopont)
Foreign-key constraints:
     "$1" FOREIGN KEY (mozgasnem) REFERENCES mozgasnem(az)
     "$2" FOREIGN KEY (muvelet) REFERENCES muvelet(az)
     "$3" FOREIGN KEY (minoseg) REFERENCES minoseg(az)
     "$4" FOREIGN KEY (cikk) REFERENCES cikk(az)
     "$5" FOREIGN KEY (muszakhely) REFERENCES hely(az)
     "$6" FOREIGN KEY (muszakkod) REFERENCES muszakkod(az)
     "muvelet_vonalkod_muszak_fk"
        FOREIGN KEY (muszakhely, muszaknap, muszakkod)
        REFERENCES muszak(hely, nap, muszakkod)
Triggers:
     muvelet_vonalkod_aiud AFTER INSERT OR DELETE OR UPDATE ON
muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_aiud()
     muvelet_vonalkod_biu BEFORE INSERT OR UPDATE ON muvelet_vonalkod FOR
EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_biu()
     muvelet_vonalkod_noty AFTER INSERT OR DELETE OR UPDATE ON
muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_noty()


-- original query, limit
# explain analyze
   select idopont from muvelet_vonalkod
   where muvelet=6859 order by idopont
   limit 1;
                                QUERY PLAN

----------------------------------------------------------------------------
  Limit  (cost=0.00..25.71 rows=1 width=8) (actual time=579.528..579.529
rows=1 loops=1)
    ->  Index Scan using muvelet_vonalkod_pk3 on muvelet_vonalkod
(cost=0.00..8304.42 rows=323 width=8) (actual time=579.522..579.522 rows=1
loops=1)
          Filter: (muvelet = 6859)
  Total runtime: 579.606 ms
(4 rows)

-- however, if I omit the limit clause:
# explain analyze
   select idopont from muvelet_vonalkod
   where muvelet=6859 order by idopont;
                                QUERY PLAN

---------------------------------------------------------------------------
  Sort  (cost=405.41..405.73 rows=323 width=8) (actual time=1.295..1.395
rows=360 loops=1)
    Sort Key: idopont
    ->  Index Scan using muvelet_vonalkod_muvelet on muvelet_vonalkod
(cost=0.00..400.03 rows=323 width=8) (actual time=0.049..0.855 rows=360 loops=1)
          Index Cond: (muvelet = 6859)
  Total runtime: 1.566 ms
(5 rows)

-- workaround 1: the planner is hard to trick...
# explain analyze
   select idopont from
   (select idopont from muvelet_vonalkod
    where muvelet=6859) foo
   order by idopont limit 1;
                                QUERY PLAN

---------------------------------------------------------------------------
  Limit  (cost=0.00..25.71 rows=1 width=8) (actual time=584.403..584.404
rows=1 loops=1)
    ->  Index Scan using muvelet_vonalkod_pk3 on muvelet_vonalkod
(cost=0.00..8304.42 rows=323 width=8) (actual time=584.397..584.397 rows=1
loops=1)
          Filter: (muvelet = 6859)
  Total runtime: 584.482 ms
(4 rows)

-- workaround 2: quite ugly but seems to work (at least for this
-- one test case):
# explain analyze
   select idopont from
   (select idopont from muvelet_vonalkod
    where muvelet=6859 order by idopont) foo
   order by idopont limit 1;
                                QUERY PLAN

---------------------------------------------------------------------------
  Limit  (cost=405.41..405.42 rows=1 width=8) (actual time=1.754..1.755
rows=1 loops=1)
    ->  Subquery Scan foo  (cost=405.41..407.35 rows=323 width=8) (actual
time=1.751..1.751 rows=1 loops=1)
          ->  Sort  (cost=405.41..405.73 rows=323 width=8) (actual
time=1.746..1.746 rows=1 loops=1)
                Sort Key: idopont
                ->  Index Scan using muvelet_vonalkod_muvelet on
muvelet_vonalkod  (cost=0.00..400.03 rows=323 width=8) (actual
time=0.377..1.359 rows=360 loops=1)
                      Index Cond: (muvelet = 6859)
  Total runtime: 1.853 ms
(7 rows)


Re: Wrong index used when ORDER BY LIMIT 1

От
Michael Fuhr
Дата:
On Wed, Dec 21, 2005 at 07:03:00PM +0100, Sz?cs Gbor wrote:
> Version: 7.4.6
[...]
> Query is:
> SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.
>
> I expected the planner to choose the index on muvelet, then sort by idopont.
> Instead, it took the other index.

I think the planner is guessing that since you're ordering on
idopont, scanning the idopont index will find the first matching
row faster than using the muvelet index would.  In many cases that's
a good bet, but in this case the guess is wrong and you end up with
a suboptimal plan.

I just ran some tests with 8.1.1 and it chose the better plan for
a query similar to what you're doing.  One of the developers could
probably explain why; maybe it's because of the changes that allow
better use of multicolumn indexes.  Try 8.1.1 if you can and see
if you get better results.

> -- workaround 2: quite ugly but seems to work (at least for this
> -- one test case):
> # explain analyze
>   select idopont from
>   (select idopont from muvelet_vonalkod
>    where muvelet=6859 order by idopont) foo
>   order by idopont limit 1;

Another workaround is to use OFFSET 0 in the subquery.

--
Michael Fuhr

Re: Wrong index used when ORDER BY LIMIT 1

От
Tom Lane
Дата:
=?ISO-8859-2?Q?Sz=FBcs_G=E1bor?= <surrano@gmail.com> writes:
> Query is:
> SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.

Much the best solution for this would be to have an index on
    (muvelet, idopont)
--- perhaps you can reorder the columns of "muvelet_vonalkod_muvelet"
instead of making a whole new index --- and then say

    SELECT idopont WHERE muvelet = x ORDER BY muvelet, idopont LIMIT 1

PG 8.1 can apply such an index to your original query, but older
versions will need the help of the modified ORDER BY to recognize
that the index is usable.

            regards, tom lane

Re: Wrong index used when ORDER BY LIMIT 1

От
Szűcs Gábor
Дата:
Dear Tom,

On 2005.12.21. 20:34, Tom Lane wrote:
> =?ISO-8859-2?Q?Sz=FBcs_G=E1bor?= <surrano@gmail.com> writes:
>> Query is:
>> SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.
>
> Much the best solution for this would be to have an index on
>     (muvelet, idopont)
> --- perhaps you can reorder the columns of "muvelet_vonalkod_muvelet"
> instead of making a whole new index --- and then say
>
>     SELECT idopont WHERE muvelet = x ORDER BY muvelet, idopont LIMIT 1

I was far too tired yesterday evening to produce such a clean solution but
finally came to this conclusion this morning :) Even without the new index,
it picks the index on muvelet, which decreases time to ~1.5ms. The new index
takes it down to 0.1ms.

However, this has a problem; namely, what if I don't (or can't) tell the
exact int value in the WHERE clause? In general: will the following query:

   SELECT indexed_ts_field FROM table WHERE indexed_int_field IN (100,200)
   -- or even: indexed_int_field BETWEEN 100 AND 200
   ORDER BY indexed_ts_field LIMIT n

always pick the index on the timestamp field, or does it depend on something
else, say the limit size n and the attributes' statistics?

> PG 8.1 can apply such an index to your original query, but older
> versions will need the help of the modified ORDER BY to recognize
> that the index is usable.

So the direct cause is that 7.x planners prefer ORDER BY to WHERE when
picking indexes? But only when there is a LIMIT clause present?

I'd like to know how much of our code should I review; if it's explicitly
connected to LIMIT, I'd probably have to check far less code.

--
G.