[PERFORM] Querying with multicolumn index

Поиск
Список
Период
Сортировка
От Eric Jiang
Тема [PERFORM] Querying with multicolumn index
Дата
Msg-id CAOfJSTwzQ7Fx6Yjeg9mFkMsM5OVKPoa=EgkHceGdkr1TWg8vkA@mail.gmail.com
обсуждение исходный текст
Ответы Re: [PERFORM] Querying with multicolumn index
Re: [PERFORM] Querying with multicolumn index
Re: [PERFORM] Querying with multicolumn index
Список pgsql-performance
Hi all,
I have a query that I *think* should use a multicolumn index, but
sometimes isn't, resulting in slow queries.

We have a DB that records GPS coordinates for vehicles:

                               Table "public.updates"
   Column   |           Type           |                      Modifiers
------------+--------------------------+------------------------------------------------------
 id         | integer                  | not null default
nextval('updates_id_seq'::regclass)
 driver_id  | integer                  | not null
 latitude   | double precision         | not null
 longitude  | double precision         | not null
 time       | timestamp with time zone | not null default now()
 vehicle_id | integer                  |
Indexes:
    "updates_pkey" PRIMARY KEY, btree (id)
    "ix_updates_time" btree ("time")
    "updates_driver_id_time_idx" btree (driver_id, "time")
    "updates_vehicle_id_time_idx" btree (vehicle_id, "time")

Table has about 15M records across 100 distinct driver_id.

I want to get the last record for a specific driver:

SELECT * FROM updates WHERE driver_id=123 ORDER BY "time" DESC LIMIT 1;

For some values of driver_id, it does what I expect and uses
updates_driver_id_time_idx to fetch the records in 2 ms or less. For
other values of driver_id, it does an index scan backwards on
ix_updates_time, taking upwards of 2 minutes.

Good plan:

 Limit  (cost=0.11..1.38 rows=1 width=56) (actual time=2.710..2.710
rows=1 loops=1)
   ->  Index Scan Backward using updates_driver_id_time_idx on updates
 (cost=0.11..139278.28 rows=110051 width=56) (actual time=2.709..2.709
rows=1 loops=1)
         Index Cond: (driver_id = 17127)
 Total runtime: 2.732 ms
(4 rows)

Bad plan:

 Limit  (cost=0.09..0.69 rows=1 width=56) (actual
time=216769.111..216769.112 rows=1 loops=1)
   ->  Index Scan Backward using ix_updates_time on updates
(cost=0.09..272339.04 rows=448679 width=56) (actual
time=216769.110..216769.110 rows=1 loops=1)
         Filter: (driver_id = 30132)
         Rows Removed by Filter: 5132087
 Total runtime: 216769.174 ms


From cursory testing, the difference seems to be based on how many
total rows there are for a particular driver. The above query uses
updates_driver_id_time_idx for drivers with less than about 300K rows,
but uses ix_updates_time for drivers with more than about 300K rows.

Anything we can do to make it do the "right" thing? We are also
considering denormalizing the data and keeping a "cache" of the same
data in another table.

pgsql version: 9.3.14 and 9.5.3, already tried vacuum analyze.

Thanks,
Eric


--
Eric Jiang, DoubleMap
eric@doublemap.com | www.doublemap.com


В списке pgsql-performance по дате отправления:

Предыдущее
От: Andrey Povazhnyi
Дата:
Сообщение: Re: Slow query question
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: [PERFORM] performance issue with bitmap index scans on hugeamounts of big jsonb documents