[GENERAL] Avoid using index scan backward when limit order desc

Поиск
Список
Период
Сортировка
От Christophe Escobar
Тема [GENERAL] Avoid using index scan backward when limit order desc
Дата
Msg-id CAN3Cu8VfFrHU+ac_OyA3FdeDxp0Jb2LtH_krj0W3ciJ-Ysd=fA@mail.gmail.com
обсуждение исходный текст
Ответы Re: [GENERAL] Avoid using index scan backward when limit order desc
Re: [GENERAL] Avoid using index scan backward when limit order desc
Список pgsql-general
Hi,

I am using PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit.

I have a notification table with about ~45 000 000 rows.

I have some performance issues when trying to fetch rows from the table with a specific query, I suspect the planner to choose the wrong index because of the limit.
The query look like this: SELECT * FROM notifications WHERE bucket_id IN (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC LIMIT 20;
It returns 0 rows for this example.

The table looks like this with the indices:

Table "notifications"
Column           |            Type             |                         Modifiers
----------------------------+-----------------------------+------------------------------------------------------------
id                         | integer                     | not null default nextval('notifications_id_seq'::regclass)
account_id                 | integer                     |
version_id                 | integer                     |
item_id                    | integer                     |
type                       | character varying(255)      |
created_at                 | timestamp without time zone |
updated_at                 | timestamp without time zone |
meta_data                  | text                        |
bucket_id                  | integer                     |
Indexes:
"notifications_pkey" PRIMARY KEY, btree (id)
"index_notifications_on_account_id" btree (account_id)
"index_notifications_on_bucket_id" btree (bucket_id)
"index_notifications_on_item_id" btree (item_id)
"index_notifications_on_created_at_and_bucket_id" btree (created_at, bucket_id)
"index_notifications_on_type_and_bucket_id" btree (type, bucket_id)
"index_notifications_on_version_id" btree (version_id)

Before testing, I have run a VACUUM ANALYZE, and here are the statistics on the indices:

SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname LIKE '%notifications%';
                      relname                      | relkind |  reltuples  | relpages
---------------------------------------------------+---------+-------------+----------
 notifications_pkey                                | i       | 4.55221e+07 |   124820
 index_notifications_on_account_id                 | i       | 4.55221e+07 |   124820
 index_notifications_on_bucket_id                  | i       | 4.55221e+07 |   124819
 index_notifications_on_item_id                    | i       | 4.55221e+07 |   124821
 index_notifications_on_version_id                 | i       | 4.55221e+07 |   124821
 index_notifications_on_created_at_and_bucket_id   | i       | 4.55221e+07 |   175281
 index_notifications_on_type_and_bucket_id         | i       | 4.55221e+07 |   188281
 notifications_id_seq                              | S       |           1 |        1
 notifications                                     | r       | 4.55225e+07 |   566412


I tried three different EXPLAIN ANALYZE, on a subset of my table (with the entire table, I have yet to see what is the total duration of the query when using LIMIT 20, but it takes more than 5 minutes which is not acceptable for my use case).

** Without limit **

EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC;

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=7474.92..7480.08 rows=2061 width=187) (actual time=0.149..0.149 rows=0 loops=1)
   Sort Key: created_at
   Sort Method: quicksort  Memory: 25kB
   ->  Bitmap Heap Scan on notifications  (cost=71.68..7361.47 rows=2061 width=187) (actual time=0.136..0.136 rows=0 loops=1)
         Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
         ->  Bitmap Index Scan on index_notifications_on_type_and_bucket_id  (cost=0.00..71.16 rows=2061 width=0) (actual time=0.135..0.135 rows=0 loops=1)
               Index Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
 Planning time: 0.715 ms
 Execution time: 0.198 ms

** With LIMIT 20 **

EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC limit 20;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..3341.84 rows=20 width=187) (actual time=60133.701..60133.701 rows=0 loops=1)
   ->  Index Scan Backward using index_notifications_on_created_at_and_bucket_id on notifications  (cost=0.43..344332.66 rows=2061 width=187) (actual time=60133.695..60133.695 rows=0 loops=1)
         Filter: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
         Rows Removed by Filter: 3441510
 Planning time: 1.034 ms
 Execution time: 60133.740 ms

** With limit 50 **

EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC limit 50;

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=7429.94..7430.06 rows=50 width=187) (actual time=0.111..0.111 rows=0 loops=1)
   ->  Sort  (cost=7429.94..7435.09 rows=2061 width=187) (actual time=0.110..0.110 rows=0 loops=1)
         Sort Key: created_at
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on notifications  (cost=71.68..7361.47 rows=2061 width=187) (actual time=0.107..0.107 rows=0 loops=1)
               Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
               ->  Bitmap Index Scan on index_notifications_on_type_and_bucket_id  (cost=0.00..71.16 rows=2061 width=0) (actual time=0.105..0.105 rows=0 loops=1)
                     Index Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[])))
 Planning time: 0.151 ms
 Execution time: 0.139 ms


As you can see, when I have the LIMIT 20, the execution time takes around 1 minutes (on a very small subset of the entire table).
Actually I have tried different LIMIT, and when the LIMIT is <= 45, it will use the index scan backward.

Removing the index 'index_notifications_on_created_at_and_bucket_id' may prevent the planner from choosing the index scan backward for this query, but this index is used for other querying on that table...

1) Why is the planner changing index scanning at the threshold of 45 for the LIMIT ? Why not 50 ? 100 ? I may take the solution in my application to have a LIMIT > 45 in order to prevent the performance issue, but am I sure that this threshold will always be the same ?

2) Is it possible for a specific query to force the planner on choosing a given index or preventing it from choosing one ?

What kind of other options do I have to solve this performance issue ?

Thanks in advance for any help,

Regards,

--
Christophe Escobar

В списке pgsql-general по дате отправления:

Предыдущее
От: Joseph Kregloh
Дата:
Сообщение: Re: [GENERAL] Allow login on slave only
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: [GENERAL] Is there a way to Send attachments with email usingpgmail postgreSQl?