Re: WIP/PoC for parallel backup

Поиск
Список
Период
Сортировка
От Hamid Akhtar
Тема Re: WIP/PoC for parallel backup
Дата
Msg-id CANugjhuKbcRwZfqHepxm7uKNjEk6QRuSQaM1Om_6okJDjOwevA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP/PoC for parallel backup  (Suraj Kharage <suraj.kharage@enterprisedb.com>)
Ответы Re: WIP/PoC for parallel backup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
As far I understand, parallel backup is not a mandatory performance feature, rather, one at user's discretion. This IMHO indicates that it will benefit some users and it may not others.

Taking a backup is an I/O intensive workload. So by parallelizing it through multiple worker threads/processes, creates an overhead of its own. So what precisely are we optimizing here. Looking at a running database system in any environment, I see the following potential scenarios playing out. These are probably clear to everyone here, but I'm listing these for completeness and clarity.

Locally Running Backup:
(1) Server has no clients connected other than base backup.
(2) Server has other clients connected which are actively performing operations causing disk I/O.

Remotely Running Backup:
(3) Server has no clients connected other than remote base backup.
(4) Server has other clients connected which are actively performing operations causing disk I/O.

Others:
(5) Server or the system running base backup has other processes competing for disk or network bandwidth.

Generally speaking, I see that parallelization could potentially benefit in scenarios (2), (4) and (5) with the reason being that having more than one thread increases the likelihood that backup will now get a bigger time slice for disk I/O and network bandwidth. With (1) and (3), since there are no competing processes, addition of multiple threads or processes will only increase CPU overhead whilst still getting the same network and disk time slice. In this particular case, the performance will degrade.

IMHO, that’s why by adding other load on the server, perhaps by running pgbench simultaneously may show improved performance for parallel backup. Also, running parallel backup on a local laptop more often than yields improved performance.

There are obviously other factors that may impact the performance like the type of I/O scheduler being used whether CFQ or some other.

IMHO, parallel backup has obvious performance benefits, but we need to ensure that users understand that there is potential for slower backup if there is no competition for resources.



On Fri, May 22, 2020 at 11:03 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:

On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas@gmail.com> wrote:

So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.

Do we have the same results for the local backup case, where the patch helped?

Here is the result for local backup case (100GB data). Attaching the captured logs.

The total number of events (pg_stat_activity) captured during local runs:
- 82 events for normal backups
- 31 events for parallel backups (-j 4)

BaseBackupRead wait event numbers: (newly added)
24 - in normal backups
14 - in parallel backup (-j 4)

ClientWrite wait event numbers:
8 - in normal backup
43 - in parallel backups

ClientRead wait event numbers:
0 - ClientRead in normal backup
32 - ClientRead in parallel backups for diff processes.


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.


--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: global barrier & atomics in signal handlers (Re: Atomicoperations within spinlocks)
Следующее
От: Andres Freund
Дата:
Сообщение: Re: hashagg slowdown due to spill changes