Hello.
>- Which version of postgres is this? Newer versions avoid scanning
> unchanged parts of the heap even for freezing (9.6+, with additional
> smaller improvements in 11).
Oh, totally forgot about version and settings...
server_version 10.9 (Ubuntu 10.9-103)
So, "don't vacuum all-frozen pages" included.
> - have you increased the vacuum cost limits? Before PG 12 they're so low
> they're entirely unsuitable for larger databases, and even in 12 you
> should likely increase them for a multi-TB database
Current settings are:
autovacuum_max_workers 8
autovacuum_vacuum_cost_delay 5ms
autovacuum_vacuum_cost_limit 400
autovacuum_work_mem -1
vacuum_cost_page_dirty 40
vacuum_cost_page_hit 1
vacuum_cost_page_miss 10
"autovacuum_max_workers" set to 8 because server needs to process a lot of changing relations.
Settings were more aggressive previously (autovacuum_vacuum_cost_limit was 2800) but it leads to very high IO load causing issues with application performance and stability (even on SSD).
"vacuum_cost_page_dirty" was set to 40 few month ago. High IO write peaks were causing application requests to stuck into WALWriteLock.
After some investigations we found it was caused by WAL-logging peaks.
Such WAL-peaks are mostly consist of such records:
Type N(%) Record size (%) FPI size (%) Combined size (%)
------
Heap2/CLEAN 10520 ( 0.86) 623660 ( 0.21) 5317532 ( 0.53) 5941192 ( 0.46)
Heap2/FREEZE_PAGE 113419 ( 9.29) 6673877 ( 2.26) 635354048 ( 63.12) 642027925 ( 49.31)
another example:
Heap2/CLEAN 196707 ( 6.96) 12116527 ( 1.56) 292317231 ( 37.77) 304433758 ( 19.64)
Heap2/FREEZE_PAGE 1819 ( 0.06) 104012 ( 0.01) 13324269 ( 1.72) 13428281 ( 0.87)
Thanks,
Michail.