Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
От | Dimitrios Apostolou |
---|---|
Тема | Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward |
Дата | |
Msg-id | 1po8os49-r63o-2923-p37n-12698o1qn7p0@tzk.arg обсуждение исходный текст |
Ответ на | Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward (Dimitrios Apostolou <jimis@gmx.net>) |
Список | pgsql-hackers |
On Saturday 2025-06-14 18:17, Dimitrios Apostolou wrote: > Out of curiosity I've tried the same with an uncompressed dump > (--compress=none). Surprisingly it seems the blocksize is even smaller. > > With my patched pg_restore I only get 4K reads and nothing else on the strace > output. > > read(4, "..."..., 4096) = 4096 > read(4, "..."..., 4096) = 4096 > read(4, "..."..., 4096) = 4096 > read(4, "..."..., 4096) = 4096 > read(4, "..."..., 4096) = 4096 > read(4, "..."..., 4096) = 4096 To clarify this output again, I have a huge uncompressed custom format dump without TOC (because pg_dump was writing to stdout), and at this point pg_restore is going through the whole archive to find the items it needs. Allow me to explain what goes on at this point since I have better insight now. The code in question, in pg_backup_custom.c: /* * Skip data from current file position. * Data blocks are formatted as an integer length, followed by data. * A zero length indicates the end of the block. */ static void _skipData(ArchiveHandle *AH) { lclContext *ctx = (lclContext *) AH->formatData; size_t blkLen; char *buf = NULL; int buflen = 0; blkLen = ReadInt(AH); while (blkLen != 0) { /* Sequential access is usually faster, so avoid seeking if the * jump forward is shorter than 1MB. */ if (ctx->hasSeek && blkLen > 1024 * 1024) { if (fseeko(AH->FH, blkLen, SEEK_CUR) != 0) pg_fatal("error during file seek: %m"); } else { if (blkLen > buflen) { free(buf); buf = (char *) pg_malloc(blkLen); buflen = blkLen; } if (fread(buf, 1, blkLen, AH->FH) != blkLen) { if (feof(AH->FH)) pg_fatal("could not read from input file: end of file"); else pg_fatal("could not read from input file: %m"); } } blkLen = ReadInt(AH); } free(buf); } blkLen is almost always a number around 35 to 38. So fread() is called all the time doing reads of about ~35 bytes. Then ReadInt() is actually doing getc() a few times. And it loops over. Libc is doing buffering of 4k, and that's how we end up seeing so many 4k reads. This also explains the ~80 lseek() between each 4k read() on the unpatched version, mentioned in previous email. I've tried setvbuf() like Thomas Munro suggested and I saw a significant improvement by allocating and using a 1MB buffer for libc stream buffering. Question that remains: where is pg_dump setting this ~35B length block? Is that easy to change without breaking old versions? Thanks in advance, Dimitris
В списке pgsql-hackers по дате отправления: