Re: index fragmentation on insert-only table with non-unique column

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Re: index fragmentation on insert-only table with non-unique column
Дата
Msg-id CAGTBQpZ+auG+KhcLghvTecm4-cGGgL8vZb5uA3=47K7kf9RgJw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: index fragmentation on insert-only table with non-unique column  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: index fragmentation on insert-only table with non-unique column  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On Fri, Jun 3, 2016 at 8:54 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> As a test, I did SET effective_cache_size='1MB', before running explain, and
> still does:
>
> |        ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx on
cdrs_huawei_pgwrecord_2016_05_29 (cost=0.44..1526689.49 rows=8342796 width=355) 
> |              Index Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND
(recordopeningtime< '2016-05-30 00:00:00'::timestamp without time zone)) 
>
> I Set enable_indexscan=0, and got:
>
> |        ->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_29  (cost=168006.10..4087526.04 rows=8342796
width=355)
> |              Recheck Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND
(recordopeningtime< '2016-05-30 00:00:00'::timestamp without time zone)) 
> |              ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx  (cost=0.00..165920.40
rows=8342796width=0) 
> |                    Index Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND
(recordopeningtime< '2016-05-30 00:00:00'::timestamp without time zone)) 
>
> Here's a minimal query which seems to isolate the symptom:
>
> ts=# explain (analyze,buffers) SELECT sum(duration) FROM cdrs_huawei_pgwrecord_2016_05_22 WHERE
recordopeningtime>='2016-05-22'AND recordopeningtime<'2016-05-23'; 
> | Aggregate  (cost=2888731.67..2888731.68 rows=1 width=8) (actual time=388661.892..388661.892 rows=1 loops=1)
> |   Buffers: shared hit=4058501 read=1295147 written=35800
> |   ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_22_recordopeningtime_idx on cdrs_huawei_pgwrecord_2016_05_22
(cost=0.56..2867075.33rows=8662534 w 
> |idth=8) (actual time=0.036..379332.910 rows=8575673 loops=1)
> |         Index Cond: ((recordopeningtime >= '2016-05-22 00:00:00'::timestamp without time zone) AND
(recordopeningtime< '2016-05-23 00:00:00'::timestamp 
> | without time zone))
> |         Buffers: shared hit=4058501 read=1295147 written=35800
> | Planning time: 0.338 ms
> | Execution time: 388661.947 ms
>
> And here's an older one to avoid cache, with enable_indexscan=0
> |ts=# explain (analyze,buffers)  SELECT sum(duration) FROM cdrs_huawei_pgwrecord_2016_05_08 WHERE
recordopeningtime>='2016-05-08'AND recordopeningtime<'2016-05-09'; 
> | Aggregate  (cost=10006286.58..10006286.59 rows=1 width=8) (actual time=44219.156..44219.156 rows=1 loops=1)
> |   Buffers: shared hit=118 read=1213887 written=50113
> |   ->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_08  (cost=85142.24..9985848.96 rows=8175048 width=8)
(actualtime=708.024..40106.062 rows=8179338 loops=1) 
> |         Recheck Cond: ((recordopeningtime >= '2016-05-08 00:00:00'::timestamp without time zone) AND
(recordopeningtime< '2016-05-09 00:00:00'::timestamp without time zone)) 
> |         Rows Removed by Index Recheck: 74909
> |         Heap Blocks: lossy=1213568
> |         Buffers: shared hit=118 read=1213887 written=50113
> |         ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2016_05_08_recordopeningtime_idx1  (cost=0.00..83098.48
rows=8175048width=0) (actual time=706.557..706.557 rows=12135680 loops=1) 
> |               Index Cond: ((recordopeningtime >= '2016-05-08 00:00:00'::timestamp without time zone) AND
(recordopeningtime< '2016-05-09 00:00:00'::timestamp without time zone)) 
> |               Buffers: shared hit=117 read=320
> | Planning time: 214.786 ms
> | Execution time: 44228.874 ms
> |(12 rows)


Correct me if I'm wrong, but this looks like the planner not
accounting for correlation when using bitmap heap scans.

Checking the source, it really doesn't.

So correlated index scans look extra favourable vs bitmap index scans
because bitmap heap scans consider random page costs sans correlation
effects (even though correlation applies to bitmap heap scans as
well). While that sounds desirable a priori, it seems it's hurting
this case quite badly.

I'm not sure there's any simple way of working around that.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: index fragmentation on insert-only table with non-unique column
Следующее
От: sangeetha
Дата:
Сообщение: pg_database_size