Re: parallel pg_restore blocks on heavy random read I/O on all children processes

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: parallel pg_restore blocks on heavy random read I/O on all children processes
Дата	23 марта 18:46:42
Msg-id	556498.1742744802@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: parallel pg_restore blocks on heavy random read I/O on all children processes (Dimitrios Apostolou <jimis@gmx.net>)
Ответы	Re: parallel pg_restore blocks on heavy random read I/O on all children processes Re: parallel pg_restore blocks on heavy random read I/O on all children processes
Список	pgsql-performance

Дерево обсуждения

Dimitrios Apostolou <jimis@gmx.net> writes:
> On Thu, 20 Mar 2025, Tom Lane wrote:
>> I am betting that the problem is that the dump's TOC (table of
>> contents) lacks offsets to the actual data of the database objects,
>> and thus the readers have to reconstruct that information by scanning
>> the dump file.  Normally, pg_dump will back-fill offset data in the
>> TOC at completion of the dump, but if it's told to write to an
>> un-seekable output file then it cannot do that.

> Further questions:

> * Does the same happen in an uncompressed dump? Or maybe the offsets are
>    pre-filled because they are predictable without compression?

Yes; no.  We don't know the size of a table's data as-dumped until
we've dumped it.

> * Should pg_dump print some warning for generating a lower quality format?

I don't think so.  In many use-cases this is irrelevant and the
warning would just be an annoyance.

> * The seeking pattern in pg_restore seems non-sensical to me: reading 4K,
>    jumping 8-12K, repeat for the whole file? Consuming 15K IOPS for an
>    hour. /Maybe/ something to improve there... Where can I read more about
>    the format?

It's reading data blocks (or at least the headers thereof), which have
a limited size.  I don't think that size has changed since circa 1999,
so maybe we could consider increasing it; but I doubt we could move
the needle very far that way.

> * Why doesn't it happen in single-process pg_restore?

A single-process restore is going to restore all the data in the order
it appears in the archive file, so no seeking is required.  Of course,
as soon as you ask for parallelism, that doesn't work too well.

Hypothetically, maybe the algorithm for handing out tables-to-restore
to parallel workers could pay attention to the distance to the data
... except that in the problematic case we don't have that
information.  I don't recall for sure, but I think that the order of
the TOC entries is not necessarily a usable proxy for the order of the
data entries.  It's unclear to me that overriding the existing
heuristic (biggest tables first, I think) would be a win anyway.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: parallel pg_restore blocks on heavy random read I/O on all children processes