Re: seqscan for 100 out of 3M rows, index present

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: seqscan for 100 out of 3M rows, index present
Дата	26 июня 2013 г. 20:59:48
Msg-id	CAMkU=1xoKeo94UYx3W3v8emPzUnkN6FcotYcaH8Ooyw1HZ6ASg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: seqscan for 100 out of 3M rows, index present (Willy-Bas Loos <willybas@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

On Wed, Jun 26, 2013 at 12:18 PM, Willy-Bas Loos <willybas@gmail.com> wrote:

plan with enable_seqscan off:

Aggregate (cost=253892.48..253892.49 rows=1 width=0) (actual time=208.681..208.681 rows=1 loops=1)

The estimated cost of this is ~4x times greater than the estimated cost for the sequential scan. It should be easy to tweak things to get those to reverse, but will doing so mess up other queries that are currently OK?

-> Nested Loop (cost=5.87..253889.49 rows=1198 width=0) (actual time=69.403..208.647 rows=17 loops=1)

The estimated number of rows is off by 70 fold. Most of this is probably due to cross-column correlations, which you probably can't do much about.

-> Index Scan using geo_blok_idx on geo g (cost=0.00..1314.43 rows=500 width=8) (actual time=45.776..46.147 rows=121 loops=1)
Index Cond: (blok = 1942)

It thinks it will find 500 rows (a suspiciously round number?) but actually finds 121. That is off by a factor of 4. Why does it not produce a better estimate on such a simple histogram-based estimation? Was the table analyzed recently? Have you tried increasing default_statistics_target? If you choose values of blok other than 1942, what are the results like? This estimate feeds into the inner loop estimates multiplicatively, so this is a powerful factor in driving the choice.

        -> Bitmap Heap Scan on bmp_data d (cost=5.87..502.91 rows=179 width=8) (actual time=1.340..1.341 rows=0 loops=121)
              Recheck Cond: (geo_id = g.geo_id)
              -> Bitmap Index Scan on bmp_data_geo_idx (cost=0.00..5.82 rows=179 width=0) (actual time=1.206..1.206 rows=0 loops=121)
                    Index Cond: (geo_id = g.geo_id)
Total runtime: 208.850 ms

So it is only twice as fast as the sequential scan anyway. Were you expecting even more faster? Unless it is the dominant query in your database, I would usually not consider a factor of 2 improvement to be worth worrying about, as it is too likely you will make something else worse in the process.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: seqscan for 100 out of 3M rows, index present