Re: [PERFORM] Querying with multicolumn index

Поиск
Список
Период
Сортировка
От Daniel Blanch Bataller
Тема Re: [PERFORM] Querying with multicolumn index
Дата
Msg-id 3AE2FE1F-58FB-4E71-803A-1CE9206D1615@gmail.com
обсуждение исходный текст
Ответ на [PERFORM] Querying with multicolumn index  (Eric Jiang <eric@doublemap.com>)
Ответы Re: [PERFORM] Querying with multicolumn index  (Eric Jiang <eric@doublemap.com>)
Список pgsql-performance
Hi

As a quick fix: Have you considered dropping ix_updates_time index?

I’ve been able to reproduce the issue, but with bigger numbers than you. When I dropped ix_updates_time it went much
muchfaster. It uses updates_driver_id_time_idx index instead. 

For some reason the planner does not make a good estimation in this case. Can you show us EXPLAIN (ANALYZE, BUFFERS)
beforeand after dropping ix_updates_time index? Can you show us too the output of SHOW shared_buffers; ?  

I suspect the issue has to do with low shared_buffers configuration and cache misses, and maybe some costs suboptimal
configurationI’ll try to find it out, if anyone can enlighten us it will be very welcomed. 


P.S. Meanwhile If you still need 'time' index, you can create an index using ‘time' and ‘customer_id' in this order.



Cheers,

Daniel Blanch.


> El 9 dic 2016, a las 18:00, Eric Jiang <eric@doublemap.com> escribió:
>
> Hi all,
> I have a query that I *think* should use a multicolumn index, but
> sometimes isn't, resulting in slow queries.
>
> We have a DB that records GPS coordinates for vehicles:
>
>                               Table "public.updates"
>   Column   |           Type           |                      Modifiers
> ------------+--------------------------+------------------------------------------------------
> id         | integer                  | not null default
> nextval('updates_id_seq'::regclass)
> driver_id  | integer                  | not null
> latitude   | double precision         | not null
> longitude  | double precision         | not null
> time       | timestamp with time zone | not null default now()
> vehicle_id | integer                  |
> Indexes:
>    "updates_pkey" PRIMARY KEY, btree (id)
>    "ix_updates_time" btree ("time")
>    "updates_driver_id_time_idx" btree (driver_id, "time")
>    "updates_vehicle_id_time_idx" btree (vehicle_id, "time")
>
> Table has about 15M records across 100 distinct driver_id.
>
> I want to get the last record for a specific driver:
>
> SELECT * FROM updates WHERE driver_id=123 ORDER BY "time" DESC LIMIT 1;
>
> For some values of driver_id, it does what I expect and uses
> updates_driver_id_time_idx to fetch the records in 2 ms or less. For
> other values of driver_id, it does an index scan backwards on
> ix_updates_time, taking upwards of 2 minutes.
>
> Good plan:
>
> Limit  (cost=0.11..1.38 rows=1 width=56) (actual time=2.710..2.710
> rows=1 loops=1)
>   ->  Index Scan Backward using updates_driver_id_time_idx on updates
> (cost=0.11..139278.28 rows=110051 width=56) (actual time=2.709..2.709
> rows=1 loops=1)
>         Index Cond: (driver_id = 17127)
> Total runtime: 2.732 ms
> (4 rows)
>
> Bad plan:
>
> Limit  (cost=0.09..0.69 rows=1 width=56) (actual
> time=216769.111..216769.112 rows=1 loops=1)
>   ->  Index Scan Backward using ix_updates_time on updates
> (cost=0.09..272339.04 rows=448679 width=56) (actual
> time=216769.110..216769.110 rows=1 loops=1)
>         Filter: (driver_id = 30132)
>         Rows Removed by Filter: 5132087
> Total runtime: 216769.174 ms
>
>
> From cursory testing, the difference seems to be based on how many
> total rows there are for a particular driver. The above query uses
> updates_driver_id_time_idx for drivers with less than about 300K rows,
> but uses ix_updates_time for drivers with more than about 300K rows.
>
> Anything we can do to make it do the "right" thing? We are also
> considering denormalizing the data and keeping a "cache" of the same
> data in another table.
>
> pgsql version: 9.3.14 and 9.5.3, already tried vacuum analyze.
>
> Thanks,
> Eric
>
>
> --
> Eric Jiang, DoubleMap
> eric@doublemap.com | www.doublemap.com
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PERFORM] Querying with multicolumn index
Следующее
От: Eric Jiang
Дата:
Сообщение: Re: [PERFORM] Querying with multicolumn index