parallel pg_restore blocks on heavy random read I/O on all children processes
От | Dimitrios Apostolou |
---|---|
Тема | parallel pg_restore blocks on heavy random read I/O on all children processes |
Дата | |
Msg-id | 6bd16bdb-aa5e-0512-739d-b84100596035@gmx.net обсуждение исходный текст |
Ответы |
Re: parallel pg_restore blocks on heavy random read I/O on all children processes
|
Список | pgsql-performance |
Hello list, I noticed the weird behaviour that doing a pg_restore of a huge database dump, leads to constant read I/O (at about 15K IOPS from the NVMe drive that has the dump file) for about one hour. I believe it happens with any -j value>=2. In particular, I get output like the following in the pg_restore log, only a few seconds after running it: pg_restore: launching item 12110 TABLE DATA yyy pg_restore: processing data for table "public.yyy" [ long pause ...] pg_restore: finished item 12110 TABLE DATA yyy The output varies a bit, depending on when each process comes to the bottleneck point, and it is hard to isolate each process' log since the output is not prefixed with the PID, so take it with a grain of salt. If I strace the pg_restore children, I see them read()ing 4K and seek()ing a couple blocks forward, all the time: read(4, "..."..., 4096) = 4096 lseek(4, 55544369152, SEEK_SET) = 55544369152 read(4, "..."..., 4096) = 4096 lseek(4, 55544381440, SEEK_SET) = 55544381440 read(4, "..."..., 4096) = 4096 lseek(4, 55544397824, SEEK_SET) = 55544397824 read(4, "..."..., 4096) = 4096 lseek(4, 55544414208, SEEK_SET) = 55544414208 read(4, "..."..., 4096) = 4096 lseek(4, 55544426496, SEEK_SET) = 55544426496 No other syscalls are occuring (no sends or writes whatsoever). After the ~1h pause, I don't notice the behaviour again during the many hours restore, even though there are plenty of larger tables in the dump. The normal restore activity resumes, during which the pg_restore processes are barely noticeable in the system, and the COPY postgres backend processes do most of the work. Here is the exact pg_restore command that reproduced the issue for me last on PostgreSQL 17: pg_restore -vvv -U $DBUSER -j8 -d $DBNAME --data-only --no-tablespaces --no-table-access-method --strict-names $DUMP_FILENAME And here is the pg_dump command which has created the dump file, executed on PostgreSQL 16. pg_dump -v --format=custom --compress=zstd --no-toast-compression $DBNAME | $send_to_remote_storage The database is multiple TBs large with a couple thousands tables (notable is one huge table with 1000 partitions). The zstd-compressed custom format dump file is 816 GB in size. What do you think causes this? Is it something that can be improved? Thanks in advance, Dimitris
В списке pgsql-performance по дате отправления: