Getting rid of a seq scan in query on a large table

Поиск
Список
Период
Сортировка
От Jens Hoffrichter
Тема Getting rid of a seq scan in query on a large table
Дата
Msg-id BANLkTik0KVOg8itw_ud-s8_TtK9kzt-O=Q@mail.gmail.com
обсуждение исходный текст
Ответы Re: Getting rid of a seq scan in query on a large table  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
Hi everyone,

I'm having trouble getting rid of a sequential scan on a table with roughly 120k entries it. Creation of an index on that particular column which triggers the sequential scan doesn't do anything, VACUUM and ANALYZE has been done on the table.

The table in question has the following definition:

       Column       |           Type           |                            Modifiers
--------------------+--------------------------+------------------------------------------------------------------
 post_id            | bigint                   | not null default nextval('posts_post_id_seq'::regclass)
 forum_id           | bigint                   | not null
 threadlink         | character varying(255)   | not null
 timestamp          | timestamp with time zone | not null
 poster_id          | bigint                   |
 thread_id          | bigint                   | not null
 subject            | text                     | not null
 text               | text                     | not null
 postername         | character varying(255)   |
 internal_post_id   | bigint                   | not null default nextval('posts_internal_post_id_seq'::regclass)
 internal_thread_id | bigint                   |
Indexes:
    "posts_pkey" PRIMARY KEY, btree (internal_post_id)
    "posts_forum_id_key" UNIQUE, btree (forum_id, post_id)
    "idx_internal_thread_id" btree (internal_thread_id)
    "idx_posts_poster_id" btree (poster_id)
Foreign-key constraints:
    "posts_forum_id_fkey" FOREIGN KEY (forum_id) REFERENCES forums(forum_id)
    "posts_internal_thread_id_fkey" FOREIGN KEY (internal_thread_id) REFERENCES threads(internal_thread_id)
    "posts_poster_id_fkey" FOREIGN KEY (poster_id) REFERENCES posters(poster_id)

The query is this:

SELECT threads.internal_thread_id AS threads_internal_thread_id, threads.forum_id AS threads_forum_id, threads.thread_id AS threads_thread_id, threads.title AS threads_title, threads.poster_id AS threads_poster_id, threads.postername AS threads_postername, threads.category AS threads_category, threads.posttype AS threads_posttype                                                                                                                                                           FROM threads JOIN posts ON threads.internal_thread_id = posts.internal_thread_id JOIN posters ON posts.poster_id = posters.poster_id JOIN posters_groups AS posters_groups_1 ON posters.poster_id = posters_groups_1.poster_id JOIN groups ON groups.group_id = posters_groups_1.group_id WHERE groups.group_id = 4 ORDER BY posts.timestamp DESC;

The query plan (with an explain analyze) gives me the following:

                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=13995.93..14006.63 rows=4279 width=108) (actual time=79.927..79.947 rows=165 loops=1)
   Sort Key: posts."timestamp"
   Sort Method:  quicksort  Memory: 50kB
   ->  Nested Loop  (cost=6.97..13737.84 rows=4279 width=108) (actual time=0.605..79.693 rows=165 loops=1)
         ->  Seq Scan on groups  (cost=0.00..1.05 rows=1 width=8) (actual time=0.013..0.014 rows=1 loops=1)
               Filter: (group_id = 4)
         ->  Nested Loop  (cost=6.97..13694.00 rows=4279 width=116) (actual time=0.587..79.616 rows=165 loops=1)
               ->  Hash Join  (cost=6.97..12343.10 rows=4279 width=24) (actual time=0.568..78.230 rows=165 loops=1)
                     Hash Cond: (posts.poster_id = posters.poster_id)
                     ->  Seq Scan on posts  (cost=0.00..11862.12 rows=112312 width=24) (actual time=0.019..60.092 rows=112312 loops=1)
                     ->  Hash  (cost=6.79..6.79 rows=14 width=24) (actual time=0.101..0.101 rows=14 loops=1)
                           ->  Hash Join  (cost=2.14..6.79 rows=14 width=24) (actual time=0.060..0.093 rows=14 loops=1)
                                 Hash Cond: (posters.poster_id = posters_groups_1.poster_id)
                                 ->  Seq Scan on posters  (cost=0.00..3.83 rows=183 width=8) (actual time=0.006..0.023 rows=185 loops=1)
                                 ->  Hash  (cost=1.96..1.96 rows=14 width=16) (actual time=0.025..0.025 rows=14 loops=1)
                                       ->  Seq Scan on posters_groups posters_groups_1  (cost=0.00..1.96 rows=14 width=16) (actual time=0.016..0.021 rows=14 loops=1)
                                             Filter: (group_id = 4)
               ->  Index Scan using threads_pkey on threads  (cost=0.00..0.30 rows=1 width=100) (actual time=0.006..0.007 rows=1 loops=165)
                     Index Cond: (threads.internal_thread_id = posts.internal_thread_id)
 Total runtime: 80.137 ms
(20 rows)

So the big time lost is in this line:

Seq Scan on posts  (cost=0.00..11862.12 rows=112312 width=24) (actual time=0.019..60.092 rows=112312 loops=1)

which I can understand why it slow ;)

But I haven't yet managed to convert the Seq Scan into an Index Scan, and I'm not sure how to continue there.

As I am not a big expert on psql optimization, any input would be greatly appreciated.

Best regards,
Jens

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Cost of creating an emply WAL segment
Следующее
От: Jenish
Дата:
Сообщение: Performance issue with Insert