Re: Slow WAL recovery for DROP TABLE

Поиск

Список

Период

Сортировка

От	Sherrylyn Branchaw
Тема	Re: Slow WAL recovery for DROP TABLE
Дата	18 июля 2018 г. 23:11:56
Msg-id	CAB_myF7Z+8yYPgaDXBLFDbQvEtPi=0-fzM6+O_izJZpTSvgWOA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Slow WAL recovery for DROP TABLE (Jeff Janes <jeff.janes@gmail.com>)
Список	pgsql-general

Дерево обсуждения

> There was a recent commit for a similar performance problem, which will appear in 9.6.10. But that was specifically for cases where there were multiple dropped tables per transaction, and large shared_buffers.

Interesting, and good to know, thanks! I'm not sure we fall under either (is 8 GB large? It's larger than the default, but I always thought large was defined as "more than 8GB" for this setting), but it sounds like this sort of problem is on the developers' radars. It's possible up to 4 tables were dropped per transaction in prod, but I don't know if that's enough to count as "multiple", and in testing, I reproduced the problem with 1 drop per transaction.

> I can't reproduce your single-drop-per-transaction problem. The replica has no problem keeping up with the master.

It's possible that the problem only occurs when the replica is on inferior hardware. I was unable to test equal servers in the time I had. I noticed that when the superior server was the replica, it was able to keep up with the inferior replica, but that dropping tables was the only action for which the inferior server wasn't able to keep up with as a standby, and the only action for which the standalone outperformed the replica. I did not test truncates; it's possible I would have seen the same problem with it.

> Can you share the reproduction scripts

For the table drops, I prepped by running these:

CREATE TABLE IF NOT EXISTS test1 (id int);

CREATE TABLE IF NOT EXISTS test2 (id int);

CREATE TABLE IF NOT EXISTS test3 (id int);

...

CREATE TABLE IF NOT EXISTS test100000 (id int);

Then I dropped with these:

DROP TABLE IF EXISTS test1;

DROP TABLE IF EXISTS test2;

DROP TABLE IF EXISTS test3;

...

DROP TABLE IF EXISTS test100000;

For the inserts, I did a

CREATE TABLE test (content1 TEXT, content2 TEXT, content3 TEXT);

followed by 300 of these statements, which insert a random string into each of the three columns, in batches of 10000 rows:

INSERT INTO test SELECT array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,135)), ''), array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,135)), ''), array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,135)), '') FROM (SELECT generate_series(1,10000)) foo;

I loaded and timed each script as a file using

time psql -d mydb -f drop_tables.sql

> any non-default config settings? Especially the setting of shared_buffers (on both master and replica, if different)

Our shared_buffers setting is 8 GB on all nodes.

In prod, we had a primary and two standbys. One standby was very similar to the primary in hardware, but older and with fewer CPUs. The other standby was far inferior in hardware. For testing, I used the two standbys, and was able to reproduce when I made the superior standby the primary.

These are the non-default settings on the primary and the standby of comparable hardware (other than changing file and directory paths, which shouldn't affect anything):

listen_addresses = '*'

log_destination = 'stderr, syslog'

log_filename = 'postgresql-%Y-%m-%d.log'

log_line_prefix = '%p [%m]: %u %d %q %h %a %v %x'

syslog_facility = 'local1'

log_checkpoints = on

log_connections = on

log_disconnections = on

log_lock_waits = on

log_temp_files = 0

log_timezone = 'US/Central'

log_statement = 'ddl'

track_functions = pl

track_activity_query_size = 4096

timezone = US/Central

client_encoding = UTF8

lc_messages = 'C'

lc_monetary = 'C'

lc_numeric = 'C'

lc_time = 'C'

max_connections = 1024

tcp_keepalives_idle = 7200

tcp_keepalives_interval = 75

tcp_keepalives_count = 9

work_mem = '48 MB'

maintenance_work_mem = '1 GB'

max_locks_per_transaction = 8192

random_page_cost = 2.0

effective_cache_size = '94GB'

log_autovacuum_min_duration = 10s

autovacuum_naptime = 2min

autovacuum_vacuum_threshold = 500

autovacuum_analyze_threshold = 250

autovacuum_freeze_max_age = 1000000000

autovacuum_vacuum_cost_delay = 10ms

autovacuum_vacuum_scale_factor = 0.01

autovacuum_analyze_scale_factor = 0.01

wal_buffers = 16MB

bgwriter_lru_maxpages = 250

max_wal_senders = 5

wal_keep_segments = 256

hot_standby = on

log_min_duration_statement = 2s

wal_level = replica

wal_log_hints = on

archive_mode = on

archive_command = 'ssh postgres@backupserver "test ! -f /opt/backup/db1/%f" && rsync -q %p postgres@backupserver:/opt/backup/db1/%f'

max_parallel_workers_per_gather = 4

max_worker_processes = 16

The only different setting on the second standby is effective_cache_size, which is 24 GB instead of 94 GB.

To rule out fetching from the remote archive as the bottleneck, I tried scp-ing a bunch of WAL files to the standby and resetting the restore command to replay from the local archive. Same performance problem, and only when dropping tables.

Best,

Sherrylyn

В списке pgsql-general по дате отправления:

Предыдущее

От: Márcio Antônio Sepp
Дата: 18 июля 2018 г., 22:16:43
Сообщение: RES: Can't compile postgresql 11 on FreeBSD 11.1

Следующее

От: Sherrylyn Branchaw
Дата: 18 июля 2018 г., 23:27:48
Сообщение: Re: Slow WAL recovery for DROP TABLE

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Slow WAL recovery for DROP TABLE

Предыдущее

Следующее