Re: Patch: dumping tables data in multiple chunks in pg_dump

Поиск

Список

Период

Сортировка

От	Hannu Krosing
Тема	Re: Patch: dumping tables data in multiple chunks in pg_dump
Дата	25 ноября 00:02:15
Msg-id	CAMT0RQQPMj3=EZ-4z6qRs_TmBHoyv2VHAdMrfDuwa5ZUY6XtHQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Patch: dumping tables data in multiple chunks in pg_dump (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы	Re: Patch: dumping tables data in multiple chunks in pg_dump
Список	pgsql-hackers

Дерево обсуждения

The expectation was that as chunking is useful mainly in case of
really huge tables the analyze should have been run "recently enough".

Maybe we should use pg_relation_size() in case we have already
determined that the table is large enough to warrant chunking? Maybe
at least 1/2 of the requested chunk size?

My reasoning was to not put too much extra load on pg_dump in case
chunking is not required. But of course we can use the presence of a
chunking request to decide to run pg_relation_size(), assuming the
overhead won't be too large in this case.

On Mon, Nov 17, 2025 at 5:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk@google.com> wrote:
> >
> > Attached is a patch that adds the ability to dump table data in multiple chunks.
> >
> > Looking for feedback at this point:
> >  1) what have I missed
> >  2) should I implement something to avoid single-page chunks
> >
> > The flag --huge-table-chunk-pages which tells the directory format
> > dump to dump tables where the main fork has more pages than this in
> > multiple chunks of given number of pages,
> >
> > The main use case is speeding up parallel dumps in case of one or a
> > small number of HUGE tables so parts of these can be dumped in
> > parallel.
> >
>
> +1 for the idea, I haven't done the detailed review but I was just
> going through the patch, I noticed that we use pg_class->relpages to
> identify whether to chunk the table or not, which should be fine but
> don't you think if we use direct size calculation function like
> pg_relation_size() we might get better idea and not dependent upon
> whether the stats are updated or not?  This will make chunking
> behavior more deterministic.
>
> --
> Regards,
> Dilip Kumar
> Google

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Patch: dumping tables data in multiple chunks in pg_dump