Plan differences

Поиск

Список

Период

Сортировка

От	Anton Melser
Тема	Plan differences
Дата	31 декабря 2015 г. 13:13:39
Msg-id	CAKywjPoJS4hQiLTukMDUCvmaA_aJp-W11tvnu-=wnwaL0vuqQA@mail.gmail.com обсуждение исходный текст
Ответы	Re: Plan differences Re: Plan differences
Список	pgsql-performance

Дерево обсуждения

Hi,

I moved a DB between two "somewhat" similar Postgres installs and am getting much worse plans on the second. The DB was dumped via pg_dump (keeping indexes, etc.) and loaded to the new server.

The first (installed via emerge):

select version();

PostgreSQL 9.4rc1 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (Gentoo 4.7.3-r1 p1.4, pie-0.5.5) 4.7.3, 64-bit

The second (installed from the Postgres centos repo) :

select version();

PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit

SHOW ALL; gives identical results on both - I increased several values on both servers:

max_connections = 300

shared_buffers = 16GB

temp_buffers = 128MB

work_mem = 128MB

seq_page_cost = 0.5

random_page_cost = 1.0

effective_cache_size = 16GB

The first machine has 32GB of RAM and 16 cores (Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz) and the second 96GB of RAM and 24 cores (Intel(R) Xeon(R) CPU E5-2430L v2 @ 2.40GHz). I have a series of python scripts (including a Django site) also on the machine but did before also - load shouldn't have changed (there were some external backups on the other machine and on the new machine only my DB + scripts).

dd performance is similar for sizes under the RAM size:

oldserver:~$ dd if=/dev/zero of=output.img bs=8k count=256k

262144+0 records in

262144+0 records out

2147483648 bytes (2.1 GB) copied, 2.04997 s, 1.0 GB/s

oldserver:~$ dd if=/dev/zero of=output.img bs=8k count=1M

1048576+0 records in

1048576+0 records out

8589934592 bytes (8.6 GB) copied, 13.7105 s, 627 MB/s

[newserver ~]$ dd if=/dev/zero of=output.img bs=8k count=256k

262144+0 records in

262144+0 records out

2147483648 bytes (2.1 GB) copied, 2.03452 s, 1.1 GB/s

[newserver ~]$ dd if=/dev/zero of=output.img bs=8k count=1M

1048576+0 records in

1048576+0 records out

8589934592 bytes (8.6 GB) copied, 21.4785 s, 400 MB/s

But significantly better on the new machine over the RAM size:

oldserver:~$ dd if=/dev/zero of=output.img bs=8k count=5M

5242880+0 records in

5242880+0 records out

42949672960 bytes (43 GB) copied, 478.037 s, 89.8 MB/s

[newserver ~]$ dd if=/dev/zero of=output.img bs=8k count=15M

15728640+0 records in

15728640+0 records out

128849018880 bytes (129 GB) copied, 256.748 s, 502 MB/s

I get the following plan on the old machine for a query:

oldserver=# explain analyze select count(0) from (select message_id, count(0) from accepted where message_id like '20151213%' group by message_id having count(0) > 1) as toto;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=734.26..734.27 rows=1 width=0) (actual time=2519.545..2519.546 rows=1 loops=1)

-> GroupAggregate (cost=0.70..452.90 rows=22509 width=46) (actual time=2519.542..2519.542 rows=0 loops=1)

Group Key: accepted.message_id

Filter: (count(0) > 1)

Rows Removed by Filter: 1289815

-> Index Only Scan using idx_accepted2_mid on accepted (cost=0.70..2.72 rows=22509 width=46) (actual time=0.037..1613.982 rows=1289815 loops=1)

Index Cond: ((message_id >= '20151213'::text) AND (message_id < '20151214'::text))

Filter: ((message_id)::text ~~ '20151213%'::text)

Heap Fetches: 1289815

Planning time: 0.325 ms

Execution time: 2519.610 ms

(11 rows)

Time: 2520.534 ms

On the new machine, I was originally getting:

newserver=# explain analyze select count(0) from (select message_id, count(0) from accepted where message_id like '20151213%' group by message_id having count(0) > 1) as toto;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=8018044.22..8018044.23 rows=1 width=0) (actual time=123964.197..123964.197 rows=1 loops=1)

-> GroupAggregate (cost=7935128.17..7988431.35 rows=2369030 width=46) (actual time=123964.195..123964.195 rows=0 loops=1)

Group Key: accepted.message_id

Filter: (count(0) > 1)

Rows Removed by Filter: 1289817

-> Sort (cost=7935128.17..7941050.75 rows=2369030 width=46) (actual time=123112.260..123572.412 rows=1289817 loops=1)

Sort Key: accepted.message_id

Sort Method: external merge Disk: 70920kB

-> Seq Scan on accepted (cost=0.00..7658269.38 rows=2369030 width=46) (actual time=4450.097..105171.191 rows=1289817 loops=1)

Filter: ((message_id)::text ~~ '20151213%'::text)

Rows Removed by Filter: 232872643

Planning time: 0.145 ms

Execution time: 123995.671 ms

But after a vacuum analyze got:

newserver=# explain analyze select count(0) from (select message_id, count(0) from accepted where message_id like '20151213%' group by message_id having count(0) > 1) as toto;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=6210972.24..6210972.25 rows=1 width=0) (actual time=93052.551..93052.551 rows=1 loops=1)

-> GroupAggregate (cost=0.70..6181400.28 rows=2365757 width=46) (actual time=93052.548..93052.548 rows=0 loops=1)

Group Key: accepted.message_id

Filter: (count(0) > 1)

Rows Removed by Filter: 1289817

-> Index Only Scan using idx_accepted2_mid on accepted (cost=0.70..6134085.13 rows=2365757 width=46) (actual time=41992.489..92674.187 rows=1289817 loops=1)

Filter: ((message_id)::text ~~ '20151213%'::text)

Rows Removed by Filter: 232920074

Heap Fetches: 0

Planning time: 0.634 ms

Execution time: 93052.605 ms

(11 rows)

Time: 93078.267 ms

So at least it appears to be using the index (btree, non-unique) - but it's not using the >= + < trick which appears to drastically reduce execution time. messag_ids start with the date. If I manually use > and <, then the plans and approx performance are the same:

newserver=# explain analyze select count(0) from (select message_id, count(0) from accepted where message_id > '20151213' and message_id < '20151214' group by message_id having count(0) > 1) as toto;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=72044.92..72044.93 rows=1 width=0) (actual time=1205.840..1205.840 rows=1 loops=1)

-> GroupAggregate (cost=0.70..57367.34 rows=1174206 width=46) (actual time=1205.838..1205.838 rows=0 loops=1)

Group Key: accepted.message_id

Filter: (count(0) > 1)

Rows Removed by Filter: 1289817

-> Index Only Scan using idx_accepted2_mid on accepted (cost=0.70..33883.22 rows=1174206 width=46) (actual time=7.558..852.394 rows=1289817 loops=1)

Index Cond: ((message_id > '20151213'::text) AND (message_id < '20151214'::text))

Heap Fetches: 91

Planning time: 0.232 ms

Execution time: 1205.890 ms

(10 rows)

Time: 1225.515 ms

Does anyone have any ideas? All data are loaded into this table via copy and no updates are done. Autovacuum settings weren't changed (and is on both). Do I need to increase shared_buffers to half of available memory for the planner to make certain optimisations? Anything else I'm missing or can try? The new server has been running for almost two weeks now so I would have thought things would have had a chance to settle down.

Cheers,

Anton

В списке pgsql-performance по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 23 декабря 2015 г., 07:50:56
Сообщение: Re: [ADMIN] Connections "Startup"

Следующее

От: Pavel Stehule
Дата: 31 декабря 2015 г., 13:45:25
Сообщение: Re: Plan differences

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Plan differences

Предыдущее

Следующее