Re: Seeking guidance on extremely slow pg_restore despite strong I/O performance
| От | pg254kl@georgiou.vip |
|---|---|
| Тема | Re: Seeking guidance on extremely slow pg_restore despite strong I/O performance |
| Дата | |
| Msg-id | 176496646088.6.4993161944894462260.1049766710@georgiou.vip обсуждение исходный текст |
| Ответ на | Seeking guidance on extremely slow pg_restore despite strong I/O performance (MentionTheElephant <MentionTheElephant@gmail.com>) |
| Список | pgsql-performance |
If the dump was taken with pd_dump -Fd and pg_restore -j has no effect on restore time, that’s a good clue.
You can start with testing deferring checkpoints, by setting wal_max_size = 1TB and checkpoint_timeout = 10h, and see how this affects the pg_restore (should be limited by WAL write throughput). Perhaps increase wal_buffers to 128MB. The idea being to identify (by elimination) the write chock-point, before starting to tune for it.
Irrelevant for your problem, you should set the *_io_concurrency to 200 since you use SSDs.
pg_restore rebuilds indices so also make sure the settings relevant to index building are set appropriately (see max_parallel_* and *_io_concurrency)
Kiriakos Georgiou
On Dec 5, 2025, at 5:30 AM, MentionTheElephant - MentionTheElephant at gmail.com <mentiontheelephant_at_gmail_com_xpdkqvpqqa@simplelogin.co> wrote:Hello,
I would greatly appreciate your insight into an issue where pg_restore
runs significantly slower than expected, even though the underlying
storage shows very high random write throughput. I am trying to
understand which PostgreSQL mechanisms or system layers I should
investigate next in order to pinpoint the bottleneck and improve
restore performance.
The central question is: What should I examine further to understand
why checkpoint processing becomes the dominant bottleneck during
restore, despite fsync=off, synchronous_commit=off, and excellent
random write latency?
Below is a detailed description of the environment, the behavior
observed, the steps I have already taken, and the research performed
so far.
During pg_restore, execution time remains extremely long: around 2+
hours using a custom-format dump and over 4 hours using directory
format. The machine consistently demonstrates high random write
performance (median latency ~5 ms, ~45k random write IOPS), yet
PostgreSQL logs show very long checkpoints where the write phase
dominates (hundreds to thousands of seconds). Checkpoints appear to
stall the entire restore process.
I have tested multiple combinations of dump formats (custom and
directory) and parallel jobs (j = 1, 12, 18). The restore duration
barely changes. This strongly suggests that the bottleneck is not
client-side parallelism but internal server behavior—specifically the
checkpoint write phase.
Example log excerpts show checkpoint write times consistently in the
range of 600–1100 seconds, with large numbers of buffers written (from
hundreds of thousands to over 1.6 million). Sync times remain
negligible because fsync is disabled, reinforcing the suspicion that
PostgreSQL's internal buffer flushing and write throttling mechanisms
are the source of slowdown, not WAL or filesystem sync.
Given that:
* Storage is fast,
* fsync and synchronous commits are disabled,
* full_page_writes is off,
* wal_level is minimal,
* autovacuum is off,
* the restore is the only workload,
I am trying to determine what further PostgreSQL internals or Linux
I/O mechanisms may explain why these checkpoints are taking orders of
magnitude longer than the device’s raw write characteristics would
suggest.
I am particularly looking for guidance on:
* Whether backend or checkpointer write throttling may still be
limiting write concurrency even during bulk restore,
* Whether XFS on Hyper-V VHDX + LVM + battery-backed SSD could
introduce any serialization invisible to raw I/O tests,
* Whether certain parameters (e.g., effective_io_concurrency,
maintenance_io_concurrency, wal_writer settings, combine limits,
io_uring behavior) could unintentionally reduce write throughput,
* Whether parallel pg_restore is inherently constrained by global
buffer flushing behavior,
* Any other PostgreSQL mechanisms that could cause prolonged
checkpoint write durations even with crash-safety disabled.
Below are the configuration values and environment details referenced above.
Machine:
Hyper-V VM
24 vCPU
80 GB RAM
Ubuntu 24.04.3 (kernel 6.8.0-88)
PostgreSQL 18.1
Database size:
~700 GB across two tablespaces on separate disks (freshly restored)
Storage layout:
Each disk is its own VHDX
LVM on battery-backed SSD array
XFS for PGDATA
Barriers disabled
Random write performance (steady state):
Median latency: 5.1 ms
IOPS: ~45.6k
Restore tests:
pg_restore custom format: ~2h+
pg_restore directory format: ~4h+
Parallelism tested with j = 1, 12, 18, 24
Representative checkpoint log entries:
(write phases ranging 76–1079 seconds, buffer writes up to 1.6M)
postgresql.conf (relevant parts):
shared_buffers = 20GB
work_mem = 150MB
maintenance_work_mem = 8GB
effective_io_concurrency = 1
maintenance_io_concurrency = 1
io_max_combine_limit = 512kB
io_combine_limit = 1024kB
io_method = io_uring
fsync = off
synchronous_commit = off
wal_sync_method = fdatasync
full_page_writes = off
wal_compression = lz4
checkpoint_timeout = 60min
checkpoint_completion_target = 0.9
max_wal_size = 80GB
min_wal_size = 10GB
effective_cache_size = 65GB
autovacuum = off
max_locks_per_transaction = 256
If anyone has encountered similar behavior or can recommend specific
PostgreSQL subsystems, kernel settings, or I/O patterns worth
investigating, I would be very grateful for advice. My main goal is to
understand why checkpoint writes are so slow relative to the
hardware’s demonstrated capabilities, and how to safely accelerate the
restore workflow.
Thank you in advance for any guidance.
В списке pgsql-performance по дате отправления: