Re: WIP/PoC for parallel backup

Поиск
Список
Период
Сортировка
От Ibrar Ahmed
Тема Re: WIP/PoC for parallel backup
Дата
Msg-id CALtqXTcsT5aoxcKK1shs+r37LO6ToPJ8feztSH6w-R0zMuQD2g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP/PoC for parallel backup  (Asim R P <apraveen@pivotal.io>)
Список pgsql-hackers


On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
Hi Asif

Interesting proposal.  Bulk of the work in a backup is transferring files from source data directory to destination.  Your patch is breaking this task down in multiple sets of files and transferring each set in parallel.  This seems correct, however, your patch is also creating a new process to handle each set.  Is that necessary?  I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process.  That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup.  On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

The main question is what we really want to solve here. What is the
bottleneck? and which HW want to saturate?. Why I am saying that because
there are multiple H/W involve while taking the backup (Network/CPU/Disk). If we
already saturated the disk then there is no need to add parallelism because
we will be blocked on disk I/O anyway.  I implemented the parallel backup in a sperate
application and has wonderful results. I just skim through the code and have
some reservation that creating a separate process only for copying data is overkill.
There are two options, one is non-blocking calls or you can have some worker threads.
But before doing that need to see the pg_basebackup bottleneck, after that, we
can see what is the best way to solve that. Some numbers may help to understand the
actual benefit.


--
Ibrar Ahmed

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pierre Giraud
Дата:
Сообщение: Explain: Duplicate key "Workers" in JSON format
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: "ago" times on buildfarm status page