Обсуждение: Bad plan choices & statistic targets with a GIN index

Поиск
Список
Период
Сортировка

Bad plan choices & statistic targets with a GIN index

От
Dieter Komendera
Дата:
Hi all,

hope this is the right list to post to.
We saw some bad choices from the query planner regarding the use of a GIN index which got worse over time and
performancestarted degrading seriously, so I did some digging and I found a solution which works, but I'd like to get
someopinion on. 

Here is the table in question:

                                                            Table "public.games"
      Column      |            Type             |                     Modifiers                      | Storage  | Stats
target| Description  

------------------+-----------------------------+----------------------------------------------------+----------+--------------+-------------
 id               | integer                     | not null default nextval('games_id_seq'::regclass) | plain    |
      |  
 runners          | smallint                    |                                                    | plain    |
      |  
 player_id        | integer                     |                                                    | plain    | 1000
      |  
 partner1_id      | integer                     |                                                    | plain    | 1000
      |  
 partner2_id      | integer                     |                                                    | plain    | 1000
      |  
 partner3_id      | integer                     |                                                    | plain    | 1000
      |  
 created_at       | timestamp without time zone |                                                    | plain    |
      |  
Indexes:
    "games_pkey" PRIMARY KEY, btree (id)
    "index_games_on_created_at" btree (created_at)
    "index_games_participants" gin ((ARRAY[player_id, partner1_id, partner2_id, partner3_id])) WITH (fastupdate=off)
Has OIDs: no

I removed some columns from the output for clarity,. It has 300+ million rows. And is freshly analyzed.
As you see, I've already increased the stats targets for the columns which go into the GIN index before, but this had
novisible effect on query plan choices. 
Here's a typical query:

EXPLAIN (analyze, buffers) SELECT  "games".* FROM "games"  WHERE (ABS(runners) >= '3') AND ((ARRAY[player_id,
partner1_id,partner2_id, partner3_id]) @> ARRAY[166866])  ORDER BY id DESC LIMIT 20 OFFSET 0; 
                                                                     QUERY PLAN
                            

----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..13639.64 rows=20 width=74) (actual time=330.271..12372.777 rows=20 loops=1)
   Buffers: shared hit=3453594 read=119394
   ->  Index Scan Backward using games_pkey on games  (cost=0.57..15526034.64 rows=22767 width=74) (actual
time=330.269..12372.763rows=20 loops=1) 
         Filter: ((ARRAY[player_id, partner1_id, partner2_id, partner3_id] @> '{166866}'::integer[]) AND (abs(runners)
>=3::smallint)) 
         Rows Removed by Filter: 3687711
         Buffers: shared hit=3453594 read=119394
 Total runtime: 12372.848 ms
(7 rows)


This is plan is not the best choice, though. It would be much more efficient to use the index_games_participants index.
Forsome queries, there would be not enough records which fullfill the conditions so bascially every row of the table is
scanned.
As \d+ index_games_participants showed that the index had an "array" column, I found this:

SELECT attname, attstattarget from pg_attribute WHERE attrelid = (SELECT oid FROM pg_class WHERE relname =
'index_games_participants');
 attname | attstattarget
---------+---------------
 array   |            -1
(1 row)


Also, I noticed that for that "array" GIN index column there is content in pg_statistics, where as for the btree
indicesthere isn't. 
Because I didn't find any documentation or references on setting statistic targets on indices, I just gave it a shot:

ALTER TABLE index_games_participants ALTER COLUMN "array" SET STATISTICS 1000;

After running ANALYZE on the table:

EXPLAIN (analyze, buffers) SELECT  "games".* FROM "games"  WHERE (ABS(runners) >= '3') AND ((ARRAY[player_id,
partner1_id,partner2_id, partner3_id]) @> ARRAY[166866])  ORDER BY id DESC LIMIT 20 OFFSET 0; 

                                                                      QUERY PLAN
                               

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=33947.27..33947.32 rows=20 width=74) (actual time=624.308..624.341 rows=20 loops=1)
   Buffers: shared hit=4 read=17421
   ->  Sort  (cost=33947.27..33961.61 rows=5736 width=74) (actual time=624.306..624.318 rows=20 loops=1)
         Sort Key: id
         Sort Method: top-N heapsort  Memory: 27kB
         Buffers: shared hit=4 read=17421
         ->  Bitmap Heap Scan on games  (cost=164.49..33794.64 rows=5736 width=74) (actual time=6.704..621.592
rows=1963loops=1) 
               Recheck Cond: (ARRAY[player_id, partner1_id, partner2_id, partner3_id] @> '{166866}'::integer[])
               Filter: (abs(runners) >= 3::smallint)
               Rows Removed by Filter: 17043
               Buffers: shared hit=1 read=17421
               ->  Bitmap Index Scan on index_games_participants  (cost=0.00..163.05 rows=17207 width=0) (actual
time=4.012..4.012rows=19300 loops=1) 
                     Index Cond: (ARRAY[player_id, partner1_id, partner2_id, partner3_id] @> '{166866}'::integer[])
                     Buffers: shared hit=1 read=19
 Total runtime: 624.572 ms
(15 rows)

Much better! This reduced the bad plan choices substantially.
Also, as one could expect, SELECT * from pg_statistic WHERE starelid = (SELECT oid FROM pg_class WHERE relname =
'index_games_participants');now had much more data. 

Is this a good idea? Am I missing something? Or should the GIN index actually use the statistic targets derived from
thetable columns it depends on? 

Best,
Dieter



Re: Bad plan choices & statistic targets with a GIN index

От
Tom Lane
Дата:
Dieter Komendera <dieter@komendera.com> writes:
> Because I didn't find any documentation or references on setting statistic targets on indices, I just gave it a shot:

> ALTER TABLE index_games_participants ALTER COLUMN "array" SET STATISTICS 1000;

This works, and will help if the planner can make use of statistics on the
expression the index is indexing.  The reason it's not documented is that
it's not considered supported (yet), primarily because pg_dump won't dump
such a setting.  And the reason for that is mainly that the column names
of an index aren't guaranteed stable across PG versions.  But as long
as you're willing to remember to restore the setting manually, it's
a reasonable thing to do if it helps your query plans.

            regards, tom lane