Re: WIP/PoC for parallel backup

Поиск
Список
Период
Сортировка
От Rushabh Lathia
Тема Re: WIP/PoC for parallel backup
Дата
Msg-id CAGPqQf0Ehh-jxGRgYAk7j0oPRrW2Xk_d+h7f9yykznN2ewG=dQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP/PoC for parallel backup  (Ahsan Hadi <ahsan.hadi@gmail.com>)
Ответы Re: WIP/PoC for parallel backup
Список pgsql-hackers


On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:


On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:


On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>

Was this for a setup where the client and server were on the same
machine or where the client was on a different machine?  If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?

In the particular setup, the client and server were on different machines. 


During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows?  Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?

[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com


Sure, we can try out this and do a similar run to collect the pg_stat_activity output.

Have you had the chance to try this out?

Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.

Captured wait events after every 3 seconds during the backup for -
1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)

Here is the observation:

The total number of events (pg_stat_activity) captured during above runs:
- 314 events for normal backups
- 316 events for parallel backups (-j 4)

BaseBackupRead wait event numbers: (newly added)
37 - in normal backups
25 - in the parallel backup (-j 4)

ClientWrite wait event numbers:
175 - in normal backup
1098 - in parallel backups

ClientRead wait event numbers:
0 - ClientRead in normal backup
326 - ClientRead in parallel backups for diff processes. (all in idle state)




Thanks,

Rushabh Lathia
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ahsan Hadi
Дата:
Сообщение: Re: WIP/PoC for parallel backup
Следующее
От: Michael Paquier
Дата:
Сообщение: Schedule of commit fests for PG14