Re: Patch: dumping tables data in multiple chunks in pg_dump
| От | Dilip Kumar |
|---|---|
| Тема | Re: Patch: dumping tables data in multiple chunks in pg_dump |
| Дата | |
| Msg-id | CAFiTN-scTeRAH0q2Ga3CLgkbcfcTi31cSw73ZVZntDQG7-fE+g@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Patch: dumping tables data in multiple chunks in pg_dump (Hannu Krosing <hannuk@google.com>) |
| Список | pgsql-hackers |
On Tue, Nov 25, 2025 at 2:32 AM Hannu Krosing <hannuk@google.com> wrote: > > The expectation was that as chunking is useful mainly in case of > really huge tables the analyze should have been run "recently enough". > > Maybe we should use pg_relation_size() in case we have already > determined that the table is large enough to warrant chunking? Maybe > at least 1/2 of the requested chunk size? > > My reasoning was to not put too much extra load on pg_dump in case > chunking is not required. But of course we can use the presence of a > chunking request to decide to run pg_relation_size(), assuming the > overhead won't be too large in this case. > > > On Mon, Nov 17, 2025 at 5:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk@google.com> wrote: > > > > > > Attached is a patch that adds the ability to dump table data in multiple chunks. > > > > > > Looking for feedback at this point: > > > 1) what have I missed > > > 2) should I implement something to avoid single-page chunks > > > > > > The flag --huge-table-chunk-pages which tells the directory format > > > dump to dump tables where the main fork has more pages than this in > > > multiple chunks of given number of pages, > > > > > > The main use case is speeding up parallel dumps in case of one or a > > > small number of HUGE tables so parts of these can be dumped in > > > parallel. > > > > > > > +1 for the idea, I haven't done the detailed review but I was just > > going through the patch, I noticed that we use pg_class->relpages to > > identify whether to chunk the table or not, which should be fine but > > don't you think if we use direct size calculation function like > > pg_relation_size() we might get better idea and not dependent upon > > whether the stats are updated or not? This will make chunking > > behavior more deterministic. Yeah that makes sense, we can use relpages for initial identification and then use pg_relation_size() if relpages says the table is large enough. -- Regards, Dilip Kumar Google
В списке pgsql-hackers по дате отправления: