Re: WIP/PoC for parallel backup

Поиск

Список

Период

Сортировка

От	Stephen Frost
Тема	Re: WIP/PoC for parallel backup
Дата	23 августа 2019 г. 20:26:38
Msg-id	20190823172637.GA16436@tamriel.snowman.net обсуждение исходный текст
Ответ на	Re: WIP/PoC for parallel backup (Asif Rehman <asifr.rehman@gmail.com>)
Ответы	Re: WIP/PoC for parallel backup (Ibrar Ahmed <ibrar.ahmad@gmail.com>) Re: WIP/PoC for parallel backup (Ahsan Hadi <ahsan.hadi@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Greetings,

* Asif Rehman (asifr.rehman@gmail.com) wrote:
> On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > Interesting proposal.  Bulk of the work in a backup is transferring files
> > from source data directory to destination.  Your patch is breaking this
> > task down in multiple sets of files and transferring each set in parallel.
> > This seems correct, however, your patch is also creating a new process to
> > handle each set.  Is that necessary?  I think we should try to achieve this
> > using multiple asynchronous libpq connections from a single basebackup
> > process.  That is to use PQconnectStartParams() interface instead of
> > PQconnectdbParams(), wich is currently used by basebackup.  On the server
> > side, it may still result in multiple backend processes per connection, and
> > an attempt should be made to avoid that as well, but it seems complicated.
>
> Thanks Asim for the feedback. This is a good suggestion. The main idea I
> wanted to discuss is the design where we can open multiple backend
> connections to get the data instead of a single connection.
> On the client side we can have multiple approaches, One is to use
> asynchronous APIs ( as suggested by you) and other could be to decide
> between multi-process and multi-thread. The main point was we can extract
> lot of performance benefit by using the multiple connections and I built
> this POC to float the idea of how the parallel backup can work, since the
> core logic of getting the files using multiple connections will remain the
> same, wether we use asynchronous, multi-process or multi-threaded.
>
> I am going to address the division of files to be distributed evenly among
> multiple workers based on file sizes, that would allow to get some concrete
> numbers as well as it will also us to gauge some benefits between async and
> multiprocess/thread approach on client side.

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

Thanks,

Stephen

Вложения

signature.asc

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Fabien COELHO
Дата: 23 августа 2019 г., 20:21:44
Сообщение: Re: pg_checksums --help synopsis is quite long

Следующее

От: Ibrar Ahmed
Дата: 23 августа 2019 г., 20:50:09
Сообщение: Re: WIP/PoC for parallel backup

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WIP/PoC for parallel backup

Вложения

Предыдущее

Следующее