Re: parallel pg_restore blocks on heavy random read I/O on all children processes
От | Tom Lane |
---|---|
Тема | Re: parallel pg_restore blocks on heavy random read I/O on all children processes |
Дата | |
Msg-id | 556498.1742744802@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: parallel pg_restore blocks on heavy random read I/O on all children processes (Dimitrios Apostolou <jimis@gmx.net>) |
Ответы |
Re: parallel pg_restore blocks on heavy random read I/O on all children processes
Re: parallel pg_restore blocks on heavy random read I/O on all children processes |
Список | pgsql-performance |
Dimitrios Apostolou <jimis@gmx.net> writes: > On Thu, 20 Mar 2025, Tom Lane wrote: >> I am betting that the problem is that the dump's TOC (table of >> contents) lacks offsets to the actual data of the database objects, >> and thus the readers have to reconstruct that information by scanning >> the dump file. Normally, pg_dump will back-fill offset data in the >> TOC at completion of the dump, but if it's told to write to an >> un-seekable output file then it cannot do that. > Further questions: > * Does the same happen in an uncompressed dump? Or maybe the offsets are > pre-filled because they are predictable without compression? Yes; no. We don't know the size of a table's data as-dumped until we've dumped it. > * Should pg_dump print some warning for generating a lower quality format? I don't think so. In many use-cases this is irrelevant and the warning would just be an annoyance. > * The seeking pattern in pg_restore seems non-sensical to me: reading 4K, > jumping 8-12K, repeat for the whole file? Consuming 15K IOPS for an > hour. /Maybe/ something to improve there... Where can I read more about > the format? It's reading data blocks (or at least the headers thereof), which have a limited size. I don't think that size has changed since circa 1999, so maybe we could consider increasing it; but I doubt we could move the needle very far that way. > * Why doesn't it happen in single-process pg_restore? A single-process restore is going to restore all the data in the order it appears in the archive file, so no seeking is required. Of course, as soon as you ask for parallelism, that doesn't work too well. Hypothetically, maybe the algorithm for handing out tables-to-restore to parallel workers could pay attention to the distance to the data ... except that in the problematic case we don't have that information. I don't recall for sure, but I think that the order of the TOC entries is not necessarily a usable proxy for the order of the data entries. It's unclear to me that overriding the existing heuristic (biggest tables first, I think) would be a win anyway. regards, tom lane
В списке pgsql-performance по дате отправления: