Re: WIP/PoC for parallel backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: WIP/PoC for parallel backup
Дата
Msg-id CA+TgmoZDQ+go5tgjVLF1DitrdYnRa8fTziDxC_mqk5Vy9if8TA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP/PoC for parallel backup  (Asif Rehman <asifr.rehman@gmail.com>)
Ответы Re: WIP/PoC for parallel backup
Список pgsql-hackers
On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
>> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker
andthat worker will be getting the said files. 
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>
> I considered this approach initially,  however, I adopted the current strategy to avoid multiple round trips between
theserver and clients and save on query processing time by issuing a single command rather than multiple ones. Further
fetchingmultiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile()
functionand will be able to create a tarball for per tablespace per worker. 

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

> Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching
multiplefiles at once since that would enable using the existent functionality to be used without much change. 

I think we should just have the client generate the tarfile. It'll
require duplicating some code, but it's not actually that much code or
that complicated from what I can see.


--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "REIX, Tony"
Дата:
Сообщение: RE: Shared Memory: How to use SYSV rather than MMAP ?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Include RELKIND_TOASTVALUE in get_relkind_objtype